CN117973324A - Method, system and medium for converting html form into markdown text - Google Patents
Method, system and medium for converting html form into markdown text Download PDFInfo
- Publication number
- CN117973324A CN117973324A CN202410063513.0A CN202410063513A CN117973324A CN 117973324 A CN117973324 A CN 117973324A CN 202410063513 A CN202410063513 A CN 202410063513A CN 117973324 A CN117973324 A CN 117973324A
- Authority
- CN
- China
- Prior art keywords
- label
- sequence
- tag
- content
- merging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012512 characterization method Methods 0.000 claims description 4
- 230000000295 complement effect Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The invention provides a method, a system and a medium for converting an html form into a markdown text, wherein the method comprises the following steps: obtaining a table type tag in an html page to obtain a table tag sequence; completing the table label sequence; traversing the completed form label sequence; when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence; and after traversing is completed, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text. Aiming at the problems of cell merging and tag missing in the table in the html page, the method converts the table in the html page into a relatively complete markdown text which has symmetrical data format and is difficult to lose data content, improves the identification precision of the table in the html page, and avoids the problems of data content and data structure loss of the conventional identification method of the table in the html page.
Description
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a method, a system and a medium for converting an html form into a markdown text.
Background
The existing large language model can only recognize character text, so that for the scene of the table in the html page needing to be recognized by the large language model, such as AI Bot data source analysis, the data content and the data structure of the table in the html page need to be accurately acquired. However, the existing method for directly identifying the table in the html page has the problems of insufficient identification precision, data content loss, data structure loss and the like.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method, a system and a medium for converting an html form into a markdown text, which have high recognition accuracy and avoid the loss of data content and data structures.
In a first aspect, a method for converting an html form into a markdown text includes:
obtaining a table type tag in an html page to obtain a table tag sequence;
Completing the table label sequence;
Traversing the completed form label sequence;
When the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence;
And after traversing is completed, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text.
Further, complementing the lattice tag sequence specifically includes:
Sequentially extracting the labels in the table label sequence, and complementing the extracted labels.
Further, the complementing the extracted label specifically includes:
when the stack is empty, pushing the extracted label to the stack;
when the stack is not empty and the extracted label is a pair with the stack top label, the stack top label is popped;
when the stack is not empty and the extracted label and the stack top label have a hierarchical relationship, the extracted label is stacked;
when the stack is not empty and the extracted label and the stack top label are in the same-level relationship, creating an end label of the stack top label, inserting the end label into a table label sequence, taking the end label as the last label of the extracted label, popping the stack top label, and pushing the extracted label.
Further, traversing the completed table label sequence specifically includes:
Traversing the completed table label sequence according to the rows.
Further, when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into the label sequence of the table specifically comprises:
When traversing the first row of the table, creating an empty list, wherein the list comprises N elements, and N is the column number of the table;
The merging cell processing step is performed.
Further, after traversing the first row of the table, further comprising:
reading the list when traversing the j-th row of the table;
When the list is not empty, creating a new label according to the content of the element which is not empty in the list, and inserting the new label into the position of the j-th row and the i-th column of the characterization table in the table label sequence;
The merging cell processing step is performed.
Further, the step of merging the cells specifically includes:
When the type of the merging unit cell is row merging, copying the content of the traversed label as the content of a new label, inserting the new label into a table label sequence, and taking the new label as the next pair of traversed labels;
When the type of the merging unit cell is column merging, the column number i of the traversed label in the table is obtained, and the content of the traversed label is copied to the i-1 th element in the list.
In a second aspect, a system for converting an html form to markdown text, includes:
the acquisition unit: the method comprises the steps of obtaining a table type tag in an html page to obtain a table tag sequence;
Complement unit: the method is used for completing the table label sequence;
Cell merging processing unit: traversing the completed form tag sequence; when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence;
A text creation unit: and the method is used for sequentially reading the contents of all the tags in the table tag sequence after the traversal is completed, and constructing a markdown text.
In a third aspect, a system for converting an html form into markdown text includes a processor, an input device, an output device, and a memory, the processor, the input device, the output device, and the memory being interconnected, wherein the memory is configured to store a computer program, the computer program including program instructions, the processor being configured to invoke the program instructions to perform the method of the first aspect.
In a fourth aspect, a computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect.
According to the technical scheme, the method, the system and the medium for converting the html form into the markdown text are capable of converting the html form into the complete markdown text which is symmetrical in data format and difficult to lose data content aiming at the problems of merging cells and missing labels in the form existing in the form in the html page, improving the identification precision of the form in the html page and avoiding the problems of losing data content and data structure existing in the conventional identification method of the form in the html page. And the markdown text is better understood for a large language model, and is more beneficial to accurately answering the questions of the user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.
Fig. 1 is a flowchart of a method for converting an html form into a markdown text according to an embodiment.
Fig. 2 is a block diagram of a system for converting an html form into markdown text according to an embodiment.
Detailed Description
Embodiments of the technical scheme of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and thus are merely examples, and are not intended to limit the scope of the present application. It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Examples:
a method for converting an html form to markdown text, see fig. 1, comprising:
obtaining a table type tag in an html page to obtain a table tag sequence;
Completing the table label sequence;
Traversing the completed form label sequence;
When the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence;
And after traversing is completed, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text.
In this embodiment, since each element in the html page is provided with a tag, the tags in the html page have a pair characteristic, for example, the pair of tags includes a start tag and an end tag. Therefore, the method firstly extracts all the labels related to the table in the html page, namely the labels with the attribute of the table type, and creates a table label sequence according to all the extracted labels. Since there may be a missing tag in the table tag sequence, the method needs to complement the table tag sequence and restore all the tags related to the table as much as possible.
In this embodiment, since the table in the html page has the case of merging cells, the method needs to restore the table in the html page to the table before merging cells, then convert the table into a markdown text, and restore all the data in the table as much as possible. The reduction process comprises the following steps: traversing the completed table label sequence, copying the content of the label as the content of a new label when the traversed label is a merging cell, and inserting the new label into the table label sequence, thereby completing the splitting of the cell. And finally, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text. For example, the completed table label sequence is converted into a so object, the contents of the so object are read out in turn, each content is divided by a separator "|", and if a line feed exists, symbols similar to "- - - - - - -", are added, so that a markdown text can be obtained.
Aiming at the problems of cell merging and tag missing in the table in the html page, the method converts the table in the html page into a relatively complete markdown text which has symmetrical data format and is difficult to lose data content, improves the identification precision of the table in the html page, and avoids the problems of data content and data structure loss of the conventional identification method of the table in the html page. And the markdown text is better understood for a large language model, and is more beneficial to accurately answering the questions of the user.
Further, in some embodiments, complementing the table tag sequence specifically includes:
Sequentially extracting the labels in the table label sequence, and complementing the extracted labels.
In this embodiment, the method sequentially extracts the tags in the table tag sequence, for example, recursively queries the tags in the table tag sequence, and complements the extracted tags.
Further, the complementing the extracted label specifically includes:
when the stack is empty, pushing the extracted label to the stack;
when the stack is not empty and the extracted label is a pair with the stack top label, the stack top label is popped;
when the stack is not empty and the extracted label and the stack top label have a hierarchical relationship, the extracted label is stacked;
when the stack is not empty and the extracted label and the stack top label are in the same-level relationship, creating an end label of the stack top label, inserting the end label into a table label sequence, taking the end label as the last label of the extracted label, popping the stack top label, and pushing the extracted label.
In this embodiment, when the method supplements the label, the label is stored in the stack, and whether the label has a defect is checked layer by utilizing the characteristics of the stack, including the following cases: 1) If the stack is empty, indicating that no label needing to be judged whether missing exists in the stack, pushing the extracted label into the stack. 2) If the stack is not empty and the extracted label is a pair with the stack top label, indicating that the stack top label has no missing condition, the stack top label is popped. 3) If the stack is not empty and the extracted label and the stack top label have a hierarchical relationship, for example, the extracted label is a lower label of the stack top label, whether the extracted label is missing or not needs to be judged in addition to whether the stack top label is missing or not, so that the extracted label is stacked at this time, the extracted label becomes a new stack top label, whether the new stack top label is missing or not needs to be judged first, and then whether the original stack top label is missing or not needs to be judged. 4) The stack is not empty, the extracted label and the stack top label are in the same-level relationship, namely when the end label of the stack top label is not inquired, the start label which is the same as the stack top label is received, the condition that the end label is missing exists in the stack top label, the end label of the stack top label is created at the moment, the end label is inserted into a table label sequence and used as the last label of the extracted label, the end label is completed for the stack top label, then the stack top label is popped, and the extracted label is pushed into the stack, namely whether the extracted label is missing or not is needed to be judged.
For example, assume that the table tag sequence is [ table TR TH TH TD TD table ]; the first extracted tag is a table, when the stack is empty, the table is pushed to the stack, and the top tag is changed into the table. And the second extracted label is tr, at this time, the stack top element is a table, the stack is not empty, tr is a lower label of the table, tr is pushed onto the stack, and the stack top label is changed into tr. And thirdly, the extracted label is th, at the moment, the stack top element is tr, the stack is not empty, and the th is a lower label of tr, and then the th is pushed into the stack, and the stack top label is changed into the th. And the fourth extracted label is th, when the stack top element is th and the th is the same as the th, the stack top label th is popped, and the stack top label becomes tr. The fifth extracted label is td, at this time, the top label is tr, the stack is not empty, td is the peer label of tr, at this time, the top label has the condition of label missing, at this time, an end label tr is created, the end label tr is inserted into the table label sequence, as the last label of the extracted label td, at this time, the table label sequence becomes [ table TR TH TH TR TD TD table ], the top label tr is popped, the extracted label is td and pushed in, and the top label becomes td. The next extracted is td, at this time, the stack top element is td, and if td is the same as td, the stack top label td is popped, and the stack top label becomes a table. Finally, the table is extracted, at the moment, the stack top element is the table, the table is the same as the table, and the stack top label table is popped, at the moment, the table completion is finished.
Further, in some embodiments, traversing the completed table tag sequence specifically includes:
Traversing the completed table label sequence according to the rows.
In this embodiment, since html pages are typically tags that record elements by rows, the method traverses the completed table tag sequence by rows.
Further, in some embodiments, when the traversed tag is a merging cell, copying the content of the tag as the content of a new tag, and inserting the new tag into the table tag sequence specifically includes:
When traversing the first row of the table, creating an empty list, wherein the list comprises N elements, and N is the column number of the table;
The merging cell processing step is performed.
Reading the list when traversing the j-th row of the table;
When the list is not empty, creating a new label according to the content of the element which is not empty in the list, and inserting the new label into the position of the j-th row and the i-th column of the characterization table in the table label sequence;
The merging cell processing step is performed.
The step of processing the merging unit cell specifically comprises the following steps:
When the type of the merging unit cell is row merging, copying the content of the traversed label as the content of a new label, inserting the new label into a table label sequence, and taking the new label as the next pair of traversed labels;
When the type of the merging unit cell is column merging, the column number i of the traversed label in the table is obtained, and the content of the traversed label is copied to the i-1 th element in the list.
In the present embodiment, it is assumed that the table is:
traversing the table by row:
1) When traversing the first row, an empty list is initialized [ [ ], [ ], [ ] ].
Traversing the first cell [ title A ] of the first row, finding that the cell is a merging cell, merging 2 rows, copying the content [ title A ] of the cell as the content of a new tag, inserting the new tag into a tag sequence of a table as the next pair of tags of the traversed tag, and changing the table into a table at the moment:
Traversing the second cell of the first row [ title a ] finds that the cell is not a merged cell, without processing.
Traversing the third cell of the first row [ title B ], finds that the cell is not a merged cell, without processing.
2) Traversing the second row, the read list is [ [ ], [ (]), [ (] are ] when the list is empty and no processing is needed.
Traversing the first cell [ AAA ] of the second row finds that the cell is not a merged cell and does not require processing.
Traversing the second cell [ CCC ] of the second row, finding that the cell is a merged cell, merging 2 columns, copying the content [ CCC ] of the cell into the 1 st element in the list, wherein the list is changed into [ [ ], [ CCC ], [ ] ] ], and the column number of the cell is 2.
Traversing the third cell of the second row [ BBB ], finding that the cell is not a merged cell, no processing is required.
3) Traversing the third row, reading the list as [ [ ], [ CCC ], [ ] ], wherein the list is not empty, copying the content [ CCC ] of the 1 st element in the list as the content of a new label, inserting the new label into the position of the 3 rd row and the 2 nd column of the characterization table in the table label sequence, and changing the table into:
Title A | Title A | Title B |
AAA | CCC | BBB |
AAA | CCC | BBB |
Traversing the first cell [ AAA ] of the third row finds that the cell is not a merged cell and does not require processing.
Traversing the third row of the second cell CCC finds that the cell is not a merged cell and no processing is required.
Traversing the third row of the third cell BBB finds that the cell is not a merged cell, without processing.
Thus, the table in the html page can be restored to the table before cell merging.
A system for converting html forms to markdown text, see fig. 2, comprising:
the acquisition unit: the method comprises the steps of obtaining a table type tag in an html page to obtain a table tag sequence;
Complement unit: the method is used for completing the table label sequence;
Cell merging processing unit: traversing the completed form tag sequence; when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into a table label sequence;
A text creation unit: and the method is used for sequentially reading the contents of all the tags in the table tag sequence after the traversal is completed, and constructing a markdown text.
In the several embodiments provided by the present application, it should be understood that the disclosed system may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
For a brief description of the system provided by the embodiments of the present invention, reference may be made to the corresponding content in the foregoing embodiments where the description of the embodiments is not mentioned.
A system for converting an html form into markdown text, comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is for storing a computer program, the computer program comprising program instructions, the processor being configured for invoking the program instructions for performing the above-mentioned method.
It should be appreciated that in embodiments of the present invention, the Processor may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input devices may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of a fingerprint), a microphone, etc., and the output devices may include a display (LCD, etc.), a speaker, etc.
The memory may include read only memory and random access memory and provide instructions and data to the processor. A portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
For a brief description of the system provided by the embodiments of the present invention, reference may be made to the corresponding content in the foregoing embodiments where the description of the embodiments is not mentioned.
A computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method described above.
The computer readable storage medium may be an internal storage unit of the terminal according to any of the foregoing embodiments, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used to store the computer program and other programs and data required by the terminal. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
The media provided in the embodiments of the present invention, for brevity, reference may be made to the corresponding content in the foregoing embodiments where no mention is made in the examples section.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.
Claims (10)
1. A method for converting an html form to markdown text, comprising:
obtaining a table type tag in an html page to obtain a table tag sequence;
Completing the table label sequence;
Traversing the completed form label sequence;
when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into the table label sequence;
and after traversing is completed, sequentially reading the contents of all the labels in the table label sequence, and constructing a markdown text.
2. The method for converting html form into markdown text according to claim 1, wherein the complementing the form tag sequence specifically comprises:
Sequentially extracting the labels in the table label sequence, and complementing the extracted labels.
3. The method for converting html form to markdown text according to claim 2, wherein the complementing the extracted tag specifically comprises:
when the stack is empty, pushing the extracted label to the stack;
when the stack is not empty and the extracted label is a pair with the stack top label, the stack top label is popped;
when the stack is not empty and the extracted label and the stack top label have a hierarchical relationship, the extracted label is stacked;
When the stack is not empty and the extracted label and the stack top label are in the same-level relationship, creating an end label of the stack top label, inserting the end label into the table label sequence, taking the end label as the last label of the extracted label, popping the stack top label, and pushing the extracted label.
4. The method for converting html form into markdown text according to claim 1, wherein traversing the completed form tag sequence specifically comprises:
Traversing the completed table label sequence according to the rows.
5. The method for converting html form into markdown text according to claim 4, wherein when the traversed tag is a merging cell, copying the content of the tag as the content of a new tag, and inserting the new tag into the form tag sequence specifically comprises:
When traversing the first row of the table, creating an empty list, wherein the list comprises N elements, and N is the column number of the table;
The merging cell processing step is performed.
6. The method for converting an html form into markdown text according to claim 5, further comprising, after traversing a first row of the form:
Reading the list when traversing the j-th row of the table;
When the list is not empty, creating a new label according to the content of the element which is not empty in the list, and inserting the new label into the position of the j-th row and the i-th column of the characterization table in the table label sequence;
The merging cell processing step is performed.
7. The method for converting html form to markdown text according to claim 5 or 6, wherein the merging unit processing step specifically includes:
when the type of the merging unit lattice is row merging, copying the content of the traversed label as the content of a new label, and inserting the new label into the table label sequence to serve as the next pair of traversed labels;
and when the type of the merging unit lattice is column merging, acquiring the column number i of the traversed label in a table, and copying the content of the traversed label to the i-1 element in the list.
8. A system for converting html forms to markdown text, comprising:
the acquisition unit: the method comprises the steps of obtaining a table type tag in an html page to obtain a table tag sequence;
Complement unit: the method is used for completing the table label sequence;
Cell merging processing unit: traversing the completed form tag sequence; when the traversed label is a merging cell, copying the content of the label as the content of a new label, and inserting the new label into the table label sequence;
A text creation unit: and the method is used for sequentially reading the contents of all the labels in the table label sequence after the traversal is completed, and constructing a markdown text.
9. A system for converting an html form into markdown text, comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any one of claims 1-7.
10. A computer readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410063513.0A CN117973324A (en) | 2024-01-16 | 2024-01-16 | Method, system and medium for converting html form into markdown text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410063513.0A CN117973324A (en) | 2024-01-16 | 2024-01-16 | Method, system and medium for converting html form into markdown text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117973324A true CN117973324A (en) | 2024-05-03 |
Family
ID=90862155
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410063513.0A Pending CN117973324A (en) | 2024-01-16 | 2024-01-16 | Method, system and medium for converting html form into markdown text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117973324A (en) |
-
2024
- 2024-01-16 CN CN202410063513.0A patent/CN117973324A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8250469B2 (en) | Document layout extraction | |
CN110083805A (en) | A kind of method and system that Word file is converted to EPUB file | |
US20230161802A1 (en) | Method and device for constructing standard knowledge graph, and method and device for querying standard | |
CN110287784B (en) | Annual report text structure identification method | |
CN108664471B (en) | Character recognition error correction method, device, equipment and computer readable storage medium | |
CN112395418B (en) | Method and device for extracting target object in webpage and electronic equipment | |
CN114036909A (en) | PDF document page-crossing table merging method and device and related equipment | |
CN110532449B (en) | Method, device, equipment and storage medium for processing service document | |
CN114359533B (en) | Page number identification method based on page text and computer equipment | |
CN112784529A (en) | Mobile terminal sorting table based on BetterScroll and construction method thereof | |
CN114691907B (en) | Cross-modal retrieval method, device and medium | |
CN117973324A (en) | Method, system and medium for converting html form into markdown text | |
CN116384344A (en) | Document conversion method, device and storage medium | |
CN111241096A (en) | Text extraction method, system, terminal and storage medium for EXCEL document | |
CN112818687B (en) | Method, device, electronic equipment and storage medium for constructing title recognition model | |
CN113609825B (en) | Intelligent customer attribute tag identification method and device | |
CN113779218B (en) | Question-answer pair construction method, question-answer pair construction device, computer equipment and storage medium | |
CN114218373A (en) | High-capacity text content retrieval method and system | |
CN114579796A (en) | Machine reading understanding method and device | |
CN114581934A (en) | Test paper image processing method, device and equipment | |
CN111708891B (en) | Food material entity linking method and device between multi-source food material data | |
CN112559739A (en) | Method for processing insulation state data of power equipment | |
CN117173725B (en) | Table information processing method, apparatus, computer device and storage medium | |
CN116416629B (en) | Electronic file generation method, device, equipment and medium | |
CN115712925A (en) | Webpage tampering detection method and device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |