CN117973324A - Method, system and medium for converting html form into markdown text - Google Patents

Method, system and medium for converting html form into markdown text Download PDF

Info

Publication number
CN117973324A
CN117973324A CN202410063513.0A CN202410063513A CN117973324A CN 117973324 A CN117973324 A CN 117973324A CN 202410063513 A CN202410063513 A CN 202410063513A CN 117973324 A CN117973324 A CN 117973324A
Authority
CN
China
Prior art keywords
label
sequence
tag
content
merging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410063513.0A
Other languages
Chinese (zh)
Inventor
叶晃棋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Hexun Huagu Information Technology Co ltd
Original Assignee
Shenzhen Hexun Huagu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Hexun Huagu Information Technology Co ltd filed Critical Shenzhen Hexun Huagu Information Technology Co ltd
Priority to CN202410063513.0A priority Critical patent/CN117973324A/en
Publication of CN117973324A publication Critical patent/CN117973324A/en
Pending legal-status Critical Current

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention provides a method, a system and a medium for converting an html form into a markdown text, wherein the method comprises the following steps: obtaining a table type tag in an html page to obtain a table tag sequence; completing the table label sequence; traversing the completed form label sequence; when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence; and after traversing is completed, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text. Aiming at the problems of cell merging and tag missing in the table in the html page, the method converts the table in the html page into a relatively complete markdown text which has symmetrical data format and is difficult to lose data content, improves the identification precision of the table in the html page, and avoids the problems of data content and data structure loss of the conventional identification method of the table in the html page.

Description

Method, system and medium for converting html form into markdown text
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method, a system and a medium for converting an html form into a markdown text.
Background
The existing large language model can only recognize character text, so that for the scene of the table in the html page needing to be recognized by the large language model, such as AI Bot data source analysis, the data content and the data structure of the table in the html page need to be accurately acquired. However, the existing method for directly identifying the table in the html page has the problems of insufficient identification precision, data content loss, data structure loss and the like.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method, a system and a medium for converting an html form into a markdown text, which have high recognition accuracy and avoid the loss of data content and data structures.
In a first aspect, a method for converting an html form into a markdown text includes:
obtaining a table type tag in an html page to obtain a table tag sequence;
Completing the table label sequence;
Traversing the completed form label sequence;
When the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence;
And after traversing is completed, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text.
Further, complementing the lattice tag sequence specifically includes:
Sequentially extracting the labels in the table label sequence, and complementing the extracted labels.
Further, the complementing the extracted label specifically includes:
when the stack is empty, pushing the extracted label to the stack;
when the stack is not empty and the extracted label is a pair with the stack top label, the stack top label is popped;
when the stack is not empty and the extracted label and the stack top label have a hierarchical relationship, the extracted label is stacked;
when the stack is not empty and the extracted label and the stack top label are in the same-level relationship, creating an end label of the stack top label, inserting the end label into a table label sequence, taking the end label as the last label of the extracted label, popping the stack top label, and pushing the extracted label.
Further, traversing the completed table label sequence specifically includes:
Traversing the completed table label sequence according to the rows.
Further, when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into the label sequence of the table specifically comprises:
When traversing the first row of the table, creating an empty list, wherein the list comprises N elements, and N is the column number of the table;
The merging cell processing step is performed.
Further, after traversing the first row of the table, further comprising:
reading the list when traversing the j-th row of the table;
When the list is not empty, creating a new label according to the content of the element which is not empty in the list, and inserting the new label into the position of the j-th row and the i-th column of the characterization table in the table label sequence;
The merging cell processing step is performed.
Further, the step of merging the cells specifically includes:
When the type of the merging unit cell is row merging, copying the content of the traversed label as the content of a new label, inserting the new label into a table label sequence, and taking the new label as the next pair of traversed labels;
When the type of the merging unit cell is column merging, the column number i of the traversed label in the table is obtained, and the content of the traversed label is copied to the i-1 th element in the list.
In a second aspect, a system for converting an html form to markdown text, includes:
the acquisition unit: the method comprises the steps of obtaining a table type tag in an html page to obtain a table tag sequence;
Complement unit: the method is used for completing the table label sequence;
Cell merging processing unit: traversing the completed form tag sequence; when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence;
A text creation unit: and the method is used for sequentially reading the contents of all the tags in the table tag sequence after the traversal is completed, and constructing a markdown text.
In a third aspect, a system for converting an html form into markdown text includes a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program, the computer program including program instructions, the processor being configured to invoke the program instructions to perform the method of the first aspect.
In a fourth aspect, a computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect.
According to the technical scheme, the method, the system and the medium for converting the html form into the markdown text are capable of converting the html form into the complete markdown text which is symmetrical in data format and difficult to lose data content aiming at the problems of merging cells and missing labels in the form existing in the form in the html page, improving the identification precision of the form in the html page and avoiding the problems of losing data content and data structure existing in the conventional identification method of the form in the html page. And the markdown text is better understood for a large language model, and is more beneficial to accurately answering the questions of the user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.
Fig. 1 is a flowchart of a method for converting an html form into a markdown text according to an embodiment.
Fig. 2 is a block diagram of a system for converting an html form into markdown text according to an embodiment.
Detailed Description
Embodiments of the technical scheme of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and thus are merely examples, and are not intended to limit the scope of the present application. It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Examples:
a method for converting an html form to markdown text, see fig. 1, comprising:
obtaining a table type tag in an html page to obtain a table tag sequence;
Completing the table label sequence;
Traversing the completed form label sequence;
When the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence;
And after traversing is completed, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text.
In this embodiment, since each element in the html page is provided with a tag, the tags in the html page have a pair characteristic, for example, the pair of tags includes a start tag and an end tag. Therefore, the method firstly extracts all the labels related to the table in the html page, namely the labels with the attribute of the table type, and creates a table label sequence according to all the extracted labels. Since there may be a missing tag in the table tag sequence, the method needs to complement the table tag sequence and restore all the tags related to the table as much as possible.
In this embodiment, since the table in the html page has the case of merging cells, the method needs to restore the table in the html page to the table before merging cells, then convert the table into a markdown text, and restore all the data in the table as much as possible. The reduction process comprises the following steps: traversing the completed table label sequence, copying the content of the label as the content of a new label when the traversed label is a merging cell, and inserting the new label into the table label sequence, thereby completing the splitting of the cell. And finally, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text. For example, the completed table label sequence is converted into a so object, the contents of the so object are read out in turn, each content is divided by a separator "|", and if a line feed exists, symbols similar to "- - - - - - -", are added, so that a markdown text can be obtained.
Aiming at the problems of cell merging and tag missing in the table in the html page, the method converts the table in the html page into a relatively complete markdown text which has symmetrical data format and is difficult to lose data content, improves the identification precision of the table in the html page, and avoids the problems of data content and data structure loss of the conventional identification method of the table in the html page. And the markdown text is better understood for a large language model, and is more beneficial to accurately answering the questions of the user.
Further, in some embodiments, complementing the table tag sequence specifically includes:
Sequentially extracting the labels in the table label sequence, and complementing the extracted labels.
In this embodiment, the method sequentially extracts the tags in the table tag sequence, for example, recursively queries the tags in the table tag sequence, and complements the extracted tags.
Further, the complementing the extracted label specifically includes:
when the stack is empty, pushing the extracted label to the stack;
when the stack is not empty and the extracted label is a pair with the stack top label, the stack top label is popped;
when the stack is not empty and the extracted label and the stack top label have a hierarchical relationship, the extracted label is stacked;
when the stack is not empty and the extracted label and the stack top label are in the same-level relationship, creating an end label of the stack top label, inserting the end label into a table label sequence, taking the end label as the last label of the extracted label, popping the stack top label, and pushing the extracted label.
In this embodiment, when the method supplements the label, the label is stored in the stack, and whether the label has a defect is checked layer by utilizing the characteristics of the stack, including the following cases: 1) If the stack is empty, indicating that no label needing to be judged whether missing exists in the stack, pushing the extracted label into the stack. 2) If the stack is not empty and the extracted label is a pair with the stack top label, indicating that the stack top label has no missing condition, the stack top label is popped. 3) If the stack is not empty and the extracted label and the stack top label have a hierarchical relationship, for example, the extracted label is a lower label of the stack top label, whether the extracted label is missing or not needs to be judged in addition to whether the stack top label is missing or not, so that the extracted label is stacked at this time, the extracted label becomes a new stack top label, whether the new stack top label is missing or not needs to be judged first, and then whether the original stack top label is missing or not needs to be judged. 4) The stack is not empty, the extracted label and the stack top label are in the same-level relationship, namely when the end label of the stack top label is not inquired, the start label which is the same as the stack top label is received, the condition that the end label is missing exists in the stack top label, the end label of the stack top label is created at the moment, the end label is inserted into a table label sequence and used as the last label of the extracted label, the end label is completed for the stack top label, then the stack top label is popped, and the extracted label is pushed into the stack, namely whether the extracted label is missing or not is needed to be judged.
For example, assume that the table tag sequence is [ table TR TH TH TD TD table ]; the first extracted tag is a table, when the stack is empty, the table is pushed to the stack, and the top tag is changed into the table. And the second extracted label is tr, at this time, the stack top element is a table, the stack is not empty, tr is a lower label of the table, tr is pushed onto the stack, and the stack top label is changed into tr. And thirdly, the extracted label is th, at the moment, the stack top element is tr, the stack is not empty, and the th is a lower label of tr, and then the th is pushed into the stack, and the stack top label is changed into the th. And the fourth extracted label is th, when the stack top element is th and the th is the same as the th, the stack top label th is popped, and the stack top label becomes tr. The fifth extracted label is td, at this time, the top label is tr, the stack is not empty, td is the peer label of tr, at this time, the top label has the condition of label missing, at this time, an end label tr is created, the end label tr is inserted into the table label sequence, as the last label of the extracted label td, at this time, the table label sequence becomes [ table TR TH TH TR TD TD table ], the top label tr is popped, the extracted label is td and pushed in, and the top label becomes td. The next extracted is td, at this time, the stack top element is td, and if td is the same as td, the stack top label td is popped, and the stack top label becomes a table. Finally, the table is extracted, at the moment, the stack top element is the table, the table is the same as the table, and the stack top label table is popped, at the moment, the table completion is finished.
Further, in some embodiments, traversing the completed table tag sequence specifically includes:
Traversing the completed table label sequence according to the rows.
In this embodiment, since html pages are typically tags that record elements by rows, the method traverses the completed table tag sequence by rows.
Further, in some embodiments, when the traversed tag is a merging cell, copying the content of the tag as the content of a new tag, and inserting the new tag into the table tag sequence specifically includes:
When traversing the first row of the table, creating an empty list, wherein the list comprises N elements, and N is the column number of the table;
The merging cell processing step is performed.
Reading the list when traversing the j-th row of the table;
When the list is not empty, creating a new label according to the content of the element which is not empty in the list, and inserting the new label into the position of the j-th row and the i-th column of the characterization table in the table label sequence;
The merging cell processing step is performed.
The step of processing the merging unit cell specifically comprises the following steps:
When the type of the merging unit cell is row merging, copying the content of the traversed label as the content of a new label, inserting the new label into a table label sequence, and taking the new label as the next pair of traversed labels;
When the type of the merging unit cell is column merging, the column number i of the traversed label in the table is obtained, and the content of the traversed label is copied to the i-1 th element in the list.
In the present embodiment, it is assumed that the table is:
traversing the table by row:
1) When traversing the first row, an empty list is initialized [ [ ], [ ], [ ] ].
Traversing the first cell [ title A ] of the first row, finding that the cell is a merging cell, merging 2 rows, copying the content [ title A ] of the cell as the content of a new tag, inserting the new tag into a tag sequence of a table as the next pair of tags of the traversed tag, and changing the table into a table at the moment:
Traversing the second cell of the first row [ title a ] finds that the cell is not a merged cell, without processing.
Traversing the third cell of the first row [ title B ], finds that the cell is not a merged cell, without processing.
2) Traversing the second row, the read list is [ [ ], [ (]), [ (] are ] when the list is empty and no processing is needed.
Traversing the first cell [ AAA ] of the second row finds that the cell is not a merged cell and does not require processing.
Traversing the second cell [ CCC ] of the second row, finding that the cell is a merged cell, merging 2 columns, copying the content [ CCC ] of the cell into the 1 st element in the list, wherein the list is changed into [ [ ], [ CCC ], [ ] ] ], and the column number of the cell is 2.
Traversing the third cell of the second row [ BBB ], finding that the cell is not a merged cell, no processing is required.
3) Traversing the third row, reading the list as [ [ ], [ CCC ], [ ] ], wherein the list is not empty, copying the content [ CCC ] of the 1 st element in the list as the content of a new label, inserting the new label into the position of the 3 rd row and the 2 nd column of the characterization table in the table label sequence, and changing the table into:
Title A Title A Title B
AAA CCC BBB
AAA CCC BBB
Traversing the first cell [ AAA ] of the third row finds that the cell is not a merged cell and does not require processing.
Traversing the third row of the second cell CCC finds that the cell is not a merged cell and no processing is required.
Traversing the third row of the third cell BBB finds that the cell is not a merged cell, without processing.
Thus, the table in the html page can be restored to the table before cell merging.
A system for converting html forms to markdown text, see fig. 2, comprising:
the acquisition unit: the method comprises the steps of obtaining a table type tag in an html page to obtain a table tag sequence;
Complement unit: the method is used for completing the table label sequence;
Cell merging processing unit: traversing the completed form tag sequence; when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence;
A text creation unit: and the method is used for sequentially reading the contents of all the tags in the table tag sequence after the traversal is completed, and constructing a markdown text.
In the several embodiments provided by the present application, it should be understood that the disclosed system may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
For a brief description of the system provided by the embodiments of the present invention, reference may be made to the corresponding content in the foregoing embodiments where the description of the embodiments is not mentioned.
A system for converting an html form into markdown text, comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is for storing a computer program, the computer program comprising program instructions, the processor being configured for invoking the program instructions for performing the above-mentioned method.
It should be appreciated that in embodiments of the present invention, the Processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input devices may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of a fingerprint), a microphone, etc., and the output devices may include a display (LCD, etc.), a speaker, etc.
The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
For a brief description of the system provided by the embodiments of the present invention, reference may be made to the corresponding content in the foregoing embodiments where the description of the embodiments is not mentioned.
A computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method described above.
The computer readable storage medium may be an internal storage unit of the terminal according to any of the foregoing embodiments, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used to store the computer program and other programs and data required by the terminal. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
The media provided in the embodiments of the present invention, for brevity, reference may be made to the corresponding content in the foregoing embodiments where no mention is made in the examples section.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims (10)

1. A method for converting an html form to markdown text, comprising:
obtaining a table type tag in an html page to obtain a table tag sequence;
Completing the table label sequence;
Traversing the completed form label sequence;
when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into the table label sequence;
and after traversing is completed, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text.
2. The method for converting html form into markdown text according to claim 1, wherein the complementing the form tag sequence specifically comprises:
Sequentially extracting the labels in the table label sequence, and complementing the extracted labels.
3. The method for converting html form to markdown text according to claim 2, wherein the complementing the extracted tag specifically comprises:
when the stack is empty, pushing the extracted label to the stack;
when the stack is not empty and the extracted label is a pair with the stack top label, the stack top label is popped;
when the stack is not empty and the extracted label and the stack top label have a hierarchical relationship, the extracted label is stacked;
When the stack is not empty and the extracted label and the stack top label are in the same-level relationship, creating an end label of the stack top label, inserting the end label into the table label sequence, taking the end label as the last label of the extracted label, popping the stack top label, and pushing the extracted label.
4. The method for converting html form into markdown text according to claim 1, wherein traversing the completed form tag sequence specifically comprises:
Traversing the completed table label sequence according to the rows.
5. The method for converting html form into markdown text according to claim 4, wherein when the traversed tag is a merging cell, copying the content of the tag as the content of a new tag, and inserting the new tag into the form tag sequence specifically comprises:
When traversing the first row of the table, creating an empty list, wherein the list comprises N elements, and N is the column number of the table;
The merging cell processing step is performed.
6. The method for converting an html form into markdown text according to claim 5, further comprising, after traversing a first row of the form:
Reading the list when traversing the j-th row of the table;
When the list is not empty, creating a new label according to the content of the element which is not empty in the list, and inserting the new label into the position of the j-th row and the i-th column of the characterization table in the table label sequence;
The merging cell processing step is performed.
7. The method for converting html form to markdown text according to claim 5 or 6, wherein the merging unit processing step specifically includes:
when the type of the merging unit lattice is row merging, copying the content of the traversed label as the content of a new label, and inserting the new label into the table label sequence to serve as the next pair of traversed labels;
and when the type of the merging unit lattice is column merging, acquiring the column number i of the traversed label in a table, and copying the content of the traversed label to the i-1 element in the list.
8. A system for converting html forms to markdown text, comprising:
the acquisition unit: the method comprises the steps of obtaining a table type tag in an html page to obtain a table tag sequence;
Complement unit: the method is used for completing the table label sequence;
Cell merging processing unit: traversing the completed form tag sequence; when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into the table label sequence;
A text creation unit: and the method is used for sequentially reading the contents of all the labels in the table label sequence after the traversal is completed, and constructing a markdown text.
9. A system for converting an html form into markdown text, comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any one of claims 1-7.
10. A computer readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-7.
CN202410063513.0A 2024-01-16 2024-01-16 Method, system and medium for converting html form into markdown text Pending CN117973324A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410063513.0A CN117973324A (en) 2024-01-16 2024-01-16 Method, system and medium for converting html form into markdown text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410063513.0A CN117973324A (en) 2024-01-16 2024-01-16 Method, system and medium for converting html form into markdown text

Publications (1)

Publication Number Publication Date
CN117973324A true CN117973324A (en) 2024-05-03

Family

ID=90862155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410063513.0A Pending CN117973324A (en) 2024-01-16 2024-01-16 Method, system and medium for converting html form into markdown text

Country Status (1)

Country Link
CN (1) CN117973324A (en)

Similar Documents

Publication Publication Date Title
US8250469B2 (en) Document layout extraction
CN110083805A (en) A kind of method and system that Word file is converted to EPUB file
US20230161802A1 (en) Method and device for constructing standard knowledge graph, and method and device for querying standard
CN110287784B (en) Annual report text structure identification method
CN108664471B (en) Character recognition error correction method, device, equipment and computer readable storage medium
CN112395418B (en) Method and device for extracting target object in webpage and electronic equipment
CN114036909A (en) PDF document page-crossing table merging method and device and related equipment
CN110532449B (en) Method, device, equipment and storage medium for processing service document
CN114359533B (en) Page number identification method based on page text and computer equipment
CN112784529A (en) Mobile terminal sorting table based on BetterScroll and construction method thereof
CN114691907B (en) Cross-modal retrieval method, device and medium
CN117973324A (en) Method, system and medium for converting html form into markdown text
CN116384344A (en) Document conversion method, device and storage medium
CN111241096A (en) Text extraction method, system, terminal and storage medium for EXCEL document
CN112818687B (en) Method, device, electronic equipment and storage medium for constructing title recognition model
CN113609825B (en) Intelligent customer attribute tag identification method and device
CN113779218B (en) Question-answer pair construction method, question-answer pair construction device, computer equipment and storage medium
CN114218373A (en) High-capacity text content retrieval method and system
CN114579796A (en) Machine reading understanding method and device
CN114581934A (en) Test paper image processing method, device and equipment
CN111708891B (en) Food material entity linking method and device between multi-source food material data
CN112559739A (en) Method for processing insulation state data of power equipment
CN117173725B (en) Table information processing method, apparatus, computer device and storage medium
CN116416629B (en) Electronic file generation method, device, equipment and medium
CN115712925A (en) Webpage tampering detection method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination