CN113407665A - Text comparison method, device, medium and electronic equipment - Google Patents

Text comparison method, device, medium and electronic equipment Download PDF

Info

Publication number
CN113407665A
CN113407665A CN202110571704.4A CN202110571704A CN113407665A CN 113407665 A CN113407665 A CN 113407665A CN 202110571704 A CN202110571704 A CN 202110571704A CN 113407665 A CN113407665 A CN 113407665A
Authority
CN
China
Prior art keywords
comparison
text
file
text file
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110571704.4A
Other languages
Chinese (zh)
Inventor
庄妮
陈露露
黄灿
王长虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Youzhuju Network Technology Co Ltd
Original Assignee
Beijing Youzhuju Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Youzhuju Network Technology Co Ltd filed Critical Beijing Youzhuju Network Technology Co Ltd
Priority to CN202110571704.4A priority Critical patent/CN113407665A/en
Publication of CN113407665A publication Critical patent/CN113407665A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The disclosure relates to a text comparison method, a text comparison device, a text comparison medium and an electronic device, wherein the text comparison method comprises the following steps: acquiring an original text file and a comparison text file; respectively acquiring a first character string formed by the original text and a second character string formed by the comparison text according to the original text file and the comparison text file; comparing the first character string with the second character string to determine a comparison result; and displaying the comparison result in the original text file and/or the comparison text file. Therefore, when the original text file and the comparison text file are compared, any file information except the characters in the two character strings does not need to be considered, the method for comparing different text files is simplified, the text comparison speed is high, the text comparison efficiency is improved, the comparison result can be displayed in the original text file and/or the comparison text file, and the comparison result can be displayed more intuitively.

Description

Text comparison method, device, medium and electronic equipment
Technical Field
The present disclosure relates to the field of text processing technologies, and in particular, to a text comparison method, apparatus, medium, and electronic device.
Background
In the prior art, generally, data to be compared have the same data format, for example, both the data are in PDF format or are in text document format, and when the file formats of the text contents to be compared are not uniform, the text contents often cannot be acquired and processed. In addition, in the prior art, when text comparison is performed on files with the same format, time consumption is long generally, and the requirement on comparison speed cannot be met in a scene needing quick comparison.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a text comparison method, including:
acquiring an original text file and a comparison text file;
respectively acquiring a first character string formed by the original text and a second character string formed by the comparison text according to the original text file and the comparison text file;
comparing the first character string with the second character string to determine a comparison result;
and displaying the comparison result in the original text file and/or the comparison text file.
In a second aspect, the present disclosure provides a text comparison apparatus, including:
the first acquisition module is used for acquiring an original text file and a comparison text file;
the second acquisition module is used for respectively acquiring a first character string formed by the original text and a second character string formed by the comparison text according to the original text file and the comparison text file;
the comparison module is used for comparing the first character string with the second character string to determine a comparison result;
and the processing module is used for displaying the comparison result in the original text file and/or the comparison text file.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method described above.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the method described above.
By the technical scheme, when the original text file and the comparison text file are compared, the original text identified in the original text file and the comparison text identified in the comparison text file can respectively form the first character string and the second character string, therefore, when the original text file is compared with the comparison text file, any file information except the characters in the two character strings does not need to be considered, the method for comparing different text files is greatly simplified, the text comparison speed is higher, the text comparison efficiency is improved, and the comparison result obtained by comparing the first character string corresponding to the original text file with the second character string corresponding to the comparison text file can be displayed in the original text file and/or the comparison text file, so that the comparison result can be displayed more intuitively.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
fig. 1 is a flowchart illustrating a text comparison method according to an exemplary embodiment of the present disclosure.
Fig. 2 is a flowchart illustrating a text comparison method according to yet another exemplary embodiment of the present disclosure.
Fig. 3 is a flowchart illustrating a text comparison method according to yet another exemplary embodiment of the present disclosure.
Fig. 4a is a schematic diagram illustrating an alignment result displayed in an original text file in a text alignment method according to another exemplary embodiment of the present disclosure.
Fig. 4b is a schematic diagram illustrating comparison results displayed in a comparison text file in a text comparison method according to another exemplary embodiment of the present disclosure.
Fig. 5 is a flowchart illustrating a text comparison method according to yet another exemplary embodiment of the present disclosure.
Fig. 6 is a block diagram illustrating a structure of a text alignment apparatus according to an exemplary embodiment of the present disclosure.
Fig. 7 is a block diagram illustrating a structure of a text alignment apparatus according to another exemplary embodiment of the present disclosure.
FIG. 8 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 1 is a flowchart illustrating a text comparison method according to an exemplary embodiment of the present disclosure. As shown in fig. 1, the method includes steps 101 to 104.
In step 101, an original text file and a comparison text file are obtained. The original text file and the comparison text file can be files in any format, such as a PDF file, a picture file, even a video file, and the like.
In step 102, a first character string composed of the original text and a second character string composed of the comparison text are respectively obtained according to the original text file and the comparison text file. The original text and the comparison text are character information obtained by identification in the original text file and the comparison text file, and the character information obtained by identification corresponding to the original text and the comparison text is respectively determined as a whole continuous character string, so that the first character string formed by the original text and the second character string formed by the comparison text are obtained.
Whether the original text file and the comparison text file are single-page or multi-page files, the original text and the comparison text identified from the original text file and the comparison text file are respectively taken as an integral character string.
In step 103, the first character string and the second character string are compared to determine a comparison result.
After the original text and the comparison text identified in the original text file and the comparison text file are respectively determined as the first character string and the second character string, comparison between the original text file and the comparison text file can be directly performed according to the first character string and the second character string. Therefore, the comparison information between the original text file and the comparison file can be determined and obtained through the comparison result between the character strings.
In step 104, the comparison result is displayed in the original text file and/or the comparison text file.
The method for displaying the comparison result in the original text file and/or the comparison text file can be various. For example, the comparison result may be displayed in both the original text file and the comparison text file, or may be displayed only in the original text file or only in the comparison text file; the comparison result may be displayed in a list form, or each comparison result may be displayed according to a text position associated with the comparison result, and the like. When the comparison result is displayed according to the text position associated with the comparison result, when the first character string and the second character string are determined according to the original text and the comparison text identified in the original text file and the comparison text file, the configuration of the original text and the comparison text in the first character string and the second character string can be determined according to, for example, the page number order and the text arrangement order in each page, so that the positions of the comparison result in the original text file and the comparison text file can be determined according to the positions of the comparison result in the first character string and the second character string respectively, and the comparison result is displayed in the corresponding files.
According to the technical scheme, when the original text file and the comparison text file are compared, the first character string and the second character string can be respectively formed by the original text identified in the original text file and the comparison text identified in the comparison text file, so that any file information except characters in the two character strings is not required to be considered when the original text file and the comparison text file are compared, a method for comparing different text files is greatly simplified, the text comparison efficiency is improved, comparison results obtained by comparing the first character string corresponding to the original text file and the second character string corresponding to the comparison text file can be displayed in the original text file and/or the comparison text file, and the comparison results can be displayed more intuitively.
In a possible embodiment, the comparison result is one or more tag data, and the tag data includes a tag type and a tag position; wherein the label position comprises a start-stop character position of a first text corresponding to the label data in the first character string and a start-stop character position of a second text corresponding to the label data in the second character string, and the start-stop character position of the first text and the start-stop character position of the second text are start-stop character offset values of the label position in the first character string and the second character string respectively; the tag types include one or more of the following four types: the tag type may also include other types, and the kind of the tag type is not limited in this disclosure. In addition, the tag data may further include tag IDs, which may be sequentially assigned from 0 to characterize the number of the tag data.
Specifically, the tag data in the alignment result can be shown in table 1. Four different types of tag data are shown in table 1, wherein tag _ ID characterizes the tag ID of the tag data; tag represents the tag type of the tag data, including equal (same), replace (replacement), delete (deletion) and insert (insertion); the array A in the tag position represents the first character string formed by the original text in the original text file, the array B in the tag position represents the second character string formed by the comparison text in the comparison text file, and the number of each character in the character string is represented by an array subscript, for example, A [0] represents the first character in the array A; that is, the start-stop character position of the first text corresponding to the tag data in the first character string and the start-stop character position of the second text corresponding to the tag data in the second character string may be represented in the form of an array; the start-stop character offset value may be an offset value relative to the first character in the string, i.e., may be represented by the array index.
TABLE 1
Figure BDA0003082949520000071
In addition, because the format of the original text file and the format of the comparison text file can be any format, before the first character string and the second character string are obtained, a certain conversion process can be performed on the original text file and the comparison text file, and then the text content in the original text file and the comparison text file can be extracted to obtain the first character string and the second character string. An exemplary method of obtaining the first string and the second string is given in fig. 2.
Fig. 2 is a flowchart illustrating a text comparison method according to yet another exemplary embodiment of the present disclosure. As shown in fig. 2, the method further includes steps 201 to 203.
In step 201, the original text file and the comparison text file are converted into one or more image format files. The specific image format of the image format file is not limited in this disclosure.
In step 202, character recognition is performed on the image format files to obtain text information corresponding to each image format file. The method of character recognition may be conventional OCR recognition, or may be any other method of character recognition.
In step 203, according to the page number sequence of the image format file in the original text file or the comparison text file, the text information corresponding to the image format file is spliced to obtain the first character string and the second character string corresponding to the original text file and the comparison text file, respectively.
Under the condition that the original text file and/or the comparison text file comprises a plurality of file pages, the text information obtained by identification also respectively corresponds to corresponding page information, and at the moment, the text information respectively corresponding to the plurality of pages can be sequentially spliced according to the sequence of the original pages, so that the first character string and the second character string respectively corresponding to the original text file and the comparison text file are obtained. If the document page number in the original text file and/or the comparison file is a single page, the first character string and/or the second character string can be determined only according to the arrangement sequence of the text information obtained by identification in the original text file or the comparison text file.
Fig. 3 is a flowchart illustrating a text comparison method according to yet another exemplary embodiment of the present disclosure. As shown in fig. 3, the method further comprises step 301 and step 302.
In step 301, a target file corresponding to the tag data is determined according to the tag type in the tag data, where the target file includes an original text file and/or a comparison text file.
In step 302, the tag data are respectively displayed in the target files corresponding to the tag data.
As shown in table 1 above, the tag types of the tag data may be equal (same), replace (replacement), delete (deletion), insert (insertion). For an original text file, the tag data belonging to the insert type is characterized by comparing text contents which are newly inserted and do not exist in the text file relative to the original text file in the text file, so that the tag position of the tag data in the original text file only corresponds to an insertion position, and no corresponding text exists, and therefore the tag data of the type can not be displayed in the original text file, and therefore, the target file corresponding to the tag data of which the tag type is the insert type can be preset and does not include the original text file. For the comparison text file, the label data belonging to the delete type represents the deleted text content which does not exist in the original text file relative to the comparison text file, so that the label position of the label data in the comparison text file only corresponds to one insertion position and does not have a corresponding text, and the label data of the type also does not need to be displayed in the comparison text file, so that the target file corresponding to the label data of which the label type is the delete type can be preset and does not include the comparison text file. The label data with the label types of equal and replace can be displayed in both the original text file and the comparison text file, so that the target file corresponding to the label data with the label types of equal and replace can comprise both the original text file and the comparison text file.
Fig. 4a is a schematic diagram illustrating a comparison result displayed in an original text file in a text comparison method according to yet another exemplary embodiment of the present disclosure, and fig. 4b is a schematic diagram illustrating a comparison result displayed in a comparison text file in a text comparison method according to yet another exemplary embodiment of the present disclosure. As shown in fig. 4a, the original text file shows tag data of equal (same) type 1, replace (replacement) type 2, delete (deletion) type 3, as shown in fig. 4b, and the comparison text file shows tag data of equal (same) type 1, replace (replacement) type 2, insert (insertion) type 4; the label data of equal (same) type 1 is displayed in the text in a straight-down line form, the label data of place (replacement) type 2 is displayed in the text in a wavy-down line form, the label data of delete (type 3) is displayed in the text in a deleted symbol form, and the label data of insert (type 4) is displayed in the text in a double-down straight-down line form.
Therefore, before the comparison result is displayed in the original text file and the comparison text file, the corresponding relation between each label type and the original text file and the corresponding relation between each label type and the comparison text file can be preset according to the self meaning of each label type, so that the label data are distributed, and the display picture can be clearer and simpler when the comparison result is finally displayed.
Fig. 5 is a flowchart illustrating a text comparison method according to yet another exemplary embodiment of the present disclosure. As shown in fig. 5, the method further includes step 501 and step 502.
When the comparison result is mapped back to the original text file and the comparison text file for display, the mapping may be performed according to the correspondence between each character in the first character string and the second character string and the text position in the original text file and the comparison text file, and therefore, the text comparison method may further include step 501 as shown in fig. 5.
In step 501, page information of the original text file and the comparison text file is obtained, where the page information includes a file page number, a total number of characters included in each page of file, a number of text lines included in each page of file, and a number of characters included in each text line. The step 501 may be executed according to an execution sequence of the steps shown in fig. 5, or may be executed before or after the first character string and the second character string are obtained, where the execution sequence of the steps is not limited as long as it is ensured that the tag data can be displayed according to the page information of the original text file and the comparison text file when the tag data is displayed.
Further, when the tag data is displayed in the target files corresponding to the tag data, the display may be performed according to step 502 and step 503 shown in fig. 5.
In step 502, a mapping relationship between the tag data and the corresponding target file is determined according to the page information and the tag position.
In step 503, the label data are mapped to the corresponding target files for display according to the mapping relationship.
In the process of determining the mapping relationship, according to the label position corresponding to each label data, text position information corresponding to the label data can be directly searched in the original text file and/or the comparison text file, so that each label data is mapped to the corresponding text content.
For example, after comparing the original text file and the comparison text file as shown in fig. 4a and 4b, the obtained comparison result may include the tag data as shown in table 2 below.
TABLE 2
Figure BDA0003082949520000101
Figure BDA0003082949520000111
The tag positions included in the tag data with the tag ID of 0 in table 2 are a [0] -a [72] "- >" B [0] -B [72], and since the tag type is equal (same) type, the tag data can be simultaneously displayed in the original text file and the comparison text file, and the mapping relationship between the tag data with the tag ID of 0 and the original text file and the comparison text file can be determined by searching the text character position corresponding to the 1 st character and the text character position corresponding to the 73 th character in the original text file and the comparison text file from the head, that is, the first character to the sixth character in the fourth row in the first row of the two text files. Alternatively, the tag data may be displayed according to a display method as described below.
In a possible implementation manner, under the condition that the page information of the original text file and the comparison text file is known, where the page information includes a file page number, a total number of characters included in each page file, a number of text lines included in each page file, and a number of characters included in each text line, the method for displaying the tag data on the corresponding text of the original text file and/or the comparison text file respectively through the page information and the tag position may first allocate the text page, the text line in each page, and each character in each line step by step, and then search for a specific position of the corresponding character. For example, the file page number corresponding to the tag data may be determined according to the file page number and the total number of characters included in each file, so that the tag data corresponding to each file can be obtained first; then determining a text line corresponding to the label data according to the number of the text line rows included in each page of the file and the number of characters included in each text line, and further positioning the label data to the specific text line of each page of the file; and finally, calculating the positions of the first text and the second text corresponding to the label data in the original text file and the comparison text file respectively according to the file page number corresponding to the label data, the text line corresponding to the label data and the label position, namely under the condition that the label data corresponding to each page and the text line specifically corresponding to the label data are known, and specifically obtaining the character offset specifically corresponding to the label data in the corresponding text line according to the label position in the label data. In this way, compared with the direct character-level mapping, the calculation amount in the mapping process can be greatly reduced, and the mapping of the label data is improved.
In addition, the mapping method of the tag data may be performed when the corresponding target file is determined for the tag data according to the tag type, or may be performed when the target file corresponding to the tag data is not determined, and when the target file corresponding to the tag data is not determined, all the tag data may be mapped to the original text file and the comparison text file for display directly according to the tag positions included in the tag data.
Fig. 6 is a block diagram illustrating a structure of a text alignment apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 6, the apparatus includes: a first obtaining module 10, configured to obtain an original text file and a comparison text file; a second obtaining module 20, configured to obtain, according to the original text file and the comparison text file, a first character string formed by the original text and a second character string formed by the comparison text, respectively; a comparison module 30, configured to compare the first character string with the second character string, and determine a comparison result; and the processing module 40 is configured to display the comparison result in the original text file and/or the comparison text file.
According to the technical scheme, when the original text file and the comparison text file are compared, the first character string and the second character string can be respectively formed by the original text identified in the original text file and the comparison text identified in the comparison text file, so that any file information except characters in the two character strings is not required to be considered when the original text file and the comparison text file are compared, a method for comparing different text files is greatly simplified, the text comparison efficiency is improved, comparison results obtained by comparing the first character string corresponding to the original text file and the second character string corresponding to the comparison text file can be displayed in the original text file and/or the comparison text file, and the comparison results can be displayed more intuitively.
In a possible embodiment, the comparison result is one or more tag data, and the tag data includes a tag type and a tag location.
In one possible implementation, the tag location includes a start-stop character location of a corresponding first text of the tag data in the first character string and a start-stop character location of a corresponding second text of the tag data in the second character string, and the start-stop character location of the first text and the start-stop character location of the second text are start-stop character offset values of the tag location in the first character string and the second character string, respectively.
In one possible embodiment, the tag types include one or more of the following four types: same, replacement, deletion, and insertion.
Fig. 7 is a block diagram illustrating a structure of a text alignment apparatus according to another exemplary embodiment of the present disclosure. As shown in fig. 7, the apparatus further includes a determining module 50, configured to determine, before the processing module 40 displays the comparison result in the original text file and/or the comparison text file, a target file corresponding to the tag data according to the tag type in the tag data, where the target file includes the original text file and/or the comparison text file. The processing module 40 is further configured to: and respectively displaying the label data in the target files corresponding to the label data.
In a possible implementation manner, as shown in fig. 7, the apparatus further includes a third obtaining module 60, configured to obtain page information of the original text file and the comparison text file; the processing module 40 includes: the determining submodule is used for determining the mapping relation between the tag data and the corresponding target file according to the page information and the tag position; and the mapping submodule is used for mapping the label data to the corresponding target files respectively according to the mapping relation to display the label data.
In one possible embodiment, the page information includes a document page number, a total number of characters included in each page of the document, a number of text lines included in each page of the document, and a number of characters included in each text line; the processing module 40 is further configured to: determining a file page number corresponding to the label data according to the file page number and the total number of the characters included in each file; determining a text line corresponding to the label data according to the number of the text line rows included in each page of file and the number of characters included in each text line; and calculating the position of the label data in the corresponding target file according to the file page number corresponding to the label data, the text line corresponding to the label data and the label position so as to display the label data.
In a possible implementation, the second obtaining module 20 further includes: the conversion submodule is used for converting the original text file and the comparison text file into one or more image format files; the recognition submodule is used for carrying out character recognition on the image format files to obtain text information corresponding to each image format file; and the splicing submodule is used for splicing the text information corresponding to the image format file according to the page number sequence of the image format file in the original text file or the comparison text file so as to obtain the first character string and the second character string respectively corresponding to the original text file and the comparison text file.
Referring now to FIG. 8, shown is a schematic diagram of an electronic device 800 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an original text file and a comparison text file; respectively acquiring a first character string formed by the original text and a second character string formed by the comparison text according to the original text file and the comparison text file; comparing the first character string with the second character string to determine a comparison result; and displaying the comparison result in the original text file and/or the comparison text file.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases form a limitation on the module itself, for example, the first obtaining module may also be described as a "module for obtaining an original text file and a comparison text file".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides, in accordance with one or more embodiments of the present disclosure, a text alignment method, the method comprising: acquiring an original text file and a comparison text file; respectively acquiring a first character string formed by the original text and a second character string formed by the comparison text according to the original text file and the comparison text file; comparing the first character string with the second character string to determine a comparison result; and displaying the comparison result in the original text file and/or the comparison text file.
Example 2 provides the method of example 1, and the comparison result is one or more tag data, where the tag data includes a tag type and a tag location.
Example 3 provides the method of example 2, the tag location including a start-stop character location of a corresponding first text of the tag data in the first string and a start-stop character location of a corresponding second text of the tag data in the second string, the start-stop character location of the first text and the start-stop character location of the second text being start-stop character offset values of the tag location in the first string and the second string, respectively.
Example 4 provides the method of example 2, the tag type comprising one or more of the following four types: same, replacement, deletion, and insertion.
Example 5 provides the method of example 2, further comprising, prior to displaying the alignment results, in accordance with one or more embodiments of the present disclosure: determining a target file corresponding to the tag data according to the tag type in the tag data, wherein the target file comprises an original text file and/or a comparison text file;
the displaying the comparison result in the original text file and/or the comparison text file respectively comprises:
and respectively displaying the label data in the target files corresponding to the label data.
Example 6 provides the method of example 5, further comprising, in accordance with one or more embodiments of the present disclosure: acquiring page information of the original text file and the comparison text file; the displaying the tag data in the target files corresponding to the tag data respectively includes: determining the mapping relation between the tag data and the corresponding target file according to the page information and the tag position; and mapping the label data to the corresponding target files respectively according to the mapping relation for display.
Example 7 provides the method of example 6, the page information including a document page number, a total number of characters included in the document per page, a number of lines of text included in the document per page, and a number of characters included in each line of text; the mapping the label data to the corresponding target files respectively according to the mapping relationship for display includes: determining a file page number corresponding to the label data according to the file page number and the total number of the characters included in each file; determining a text line corresponding to the label data according to the number of the text line rows included in each page of file and the number of characters included in each text line; and calculating the position of the label data in the corresponding target file according to the file page number corresponding to the label data, the text line corresponding to the label data and the label position so as to display the label data.
Example 8 provides the method of example 1, wherein obtaining the first string of original text and the second string of aligned text from the original text file and the aligned text file, respectively, comprises: converting the original text file and the comparison text file into one or more image format files; performing character recognition on the image format files to obtain text information corresponding to each image format file; and splicing the text information corresponding to the image format file according to the page number sequence of the image format file in the original text file or the comparison text file to obtain the first character string and the second character string corresponding to the original text file and the comparison text file respectively.
Example 9 provides, in accordance with one or more embodiments of the present disclosure, a text alignment apparatus, the apparatus comprising: the first acquisition module is used for acquiring an original text file and a comparison text file; the second acquisition module is used for respectively acquiring a first character string formed by the original text and a second character string formed by the comparison text according to the original text file and the comparison text file; the comparison module is used for comparing the first character string with the second character string to determine a comparison result; and the processing module is used for displaying the comparison result in the original text file and/or the comparison text file.
Example 10 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing device, performs the steps of the method of any of examples 1-8, in accordance with one or more embodiments of the present disclosure.
Example 11 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-8.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (11)

1. A text comparison method, comprising:
acquiring an original text file and a comparison text file;
respectively acquiring a first character string formed by the original text and a second character string formed by the comparison text according to the original text file and the comparison text file;
comparing the first character string with the second character string to determine a comparison result;
and respectively displaying the comparison results in the original text file and/or the comparison text file.
2. The method of claim 1, wherein the comparison result is one or more tag data, and the tag data comprises a tag type and a tag location.
3. The method of claim 2, wherein the tag location comprises a start-stop character location of a corresponding first text of the tag data in the first string and a start-stop character location of a corresponding second text of the tag data in the second string, the start-stop character location of the first text and the start-stop character location of the second text being start-stop character offset values of the tag location in the first string and the second string, respectively.
4. The method of claim 2, wherein the tag types include one or more of the following four types: same, replacement, deletion, and insertion.
5. The method of claim 2, wherein prior to displaying the alignment results, the method further comprises:
determining a target file corresponding to the tag data according to the tag type in the tag data, wherein the target file comprises the original text file and/or the comparison text file;
the displaying the comparison result in the original text file and/or the comparison text file respectively comprises:
and respectively displaying the label data in the target files corresponding to the label data.
6. The method of claim 5, further comprising:
acquiring page information of the original text file and the comparison text file;
the displaying the tag data in the target files corresponding to the tag data respectively includes:
determining the mapping relation between the tag data and the corresponding target file according to the page information and the tag position;
and mapping the label data to the corresponding target files respectively according to the mapping relation for display.
7. The method of claim 6, wherein the page information includes a document page number, a total number of characters included in each page of the document, a number of lines of text included in each page of the document, and a number of characters included in each line of text; the mapping the label data to the corresponding target files respectively according to the mapping relationship for display includes:
determining a file page number corresponding to the label data according to the file page number and the total number of the characters included in each file;
determining a text line corresponding to the label data according to the number of the text line rows included in each page of file and the number of characters included in each text line;
and calculating the position of the label data in the corresponding target file according to the file page number corresponding to the label data, the text line corresponding to the label data and the label position so as to display the label data.
8. The method of claim 1, wherein the obtaining a first string of original text and a second string of comparison text from the original text file and the comparison text file respectively comprises:
converting the original text file and the comparison text file into one or more image format files;
performing character recognition on the image format files to obtain text information corresponding to each image format file;
and splicing the text information corresponding to the image format file according to the page number sequence of the image format file in the original text file or the comparison text file to obtain the first character string and the second character string corresponding to the original text file and the comparison text file respectively.
9. A text comparison device, comprising:
the first acquisition module is used for acquiring an original text file and a comparison text file;
the second acquisition module is used for respectively acquiring a first character string formed by the original text and a second character string formed by the comparison text according to the original text file and the comparison text file;
the comparison module is used for comparing the first character string with the second character string to determine a comparison result;
and the processing module is used for displaying the comparison result in the original text file and/or the comparison text file.
10. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 8.
11. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 8.
CN202110571704.4A 2021-05-25 2021-05-25 Text comparison method, device, medium and electronic equipment Pending CN113407665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110571704.4A CN113407665A (en) 2021-05-25 2021-05-25 Text comparison method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110571704.4A CN113407665A (en) 2021-05-25 2021-05-25 Text comparison method, device, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN113407665A true CN113407665A (en) 2021-09-17

Family

ID=77674969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110571704.4A Pending CN113407665A (en) 2021-05-25 2021-05-25 Text comparison method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113407665A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836092A (en) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 File comparison method, device, equipment and storage medium based on RPA and AI
CN113836096A (en) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 File comparison method, device, equipment, medium and system based on RPA and AI

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548778A (en) * 2016-10-13 2017-03-29 北京云知声信息技术有限公司 A kind of generation method and device of character transformational rule
CN109190092A (en) * 2018-08-15 2019-01-11 深圳平安综合金融服务有限公司上海分公司 The consistency checking method of separate sources file
CN111090982A (en) * 2018-10-24 2020-05-01 迈普通信技术股份有限公司 Text comparison method and device, electronic equipment and computer readable storage medium
CN111753505A (en) * 2019-09-30 2020-10-09 北京沃东天骏信息技术有限公司 Document processing method, document processing device, server and storage medium
CN111832264A (en) * 2020-06-02 2020-10-27 深圳价值在线信息科技股份有限公司 PDF file based signature position determination method, device and equipment
CN112149402A (en) * 2020-09-23 2020-12-29 创新奇智(青岛)科技有限公司 Document comparison method and device, electronic equipment and computer-readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548778A (en) * 2016-10-13 2017-03-29 北京云知声信息技术有限公司 A kind of generation method and device of character transformational rule
CN109190092A (en) * 2018-08-15 2019-01-11 深圳平安综合金融服务有限公司上海分公司 The consistency checking method of separate sources file
CN111090982A (en) * 2018-10-24 2020-05-01 迈普通信技术股份有限公司 Text comparison method and device, electronic equipment and computer readable storage medium
CN111753505A (en) * 2019-09-30 2020-10-09 北京沃东天骏信息技术有限公司 Document processing method, document processing device, server and storage medium
CN111832264A (en) * 2020-06-02 2020-10-27 深圳价值在线信息科技股份有限公司 PDF file based signature position determination method, device and equipment
CN112149402A (en) * 2020-09-23 2020-12-29 创新奇智(青岛)科技有限公司 Document comparison method and device, electronic equipment and computer-readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836092A (en) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 File comparison method, device, equipment and storage medium based on RPA and AI
CN113836096A (en) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 File comparison method, device, equipment, medium and system based on RPA and AI
WO2023045053A1 (en) * 2021-09-27 2023-03-30 北京来也网络科技有限公司 File comparison method and apparatus based on rpa and ai, device, and storage medium
CN113836092B (en) * 2021-09-27 2024-06-21 北京来也网络科技有限公司 File comparison method, device, equipment and storage medium based on RPA and AI

Similar Documents

Publication Publication Date Title
CN111445902B (en) Data collection method, device, storage medium and electronic equipment
CN110659639B (en) Chinese character recognition method and device, computer readable medium and electronic equipment
CN109684589B (en) Client comment data processing method and device and computer storage medium
CN113407665A (en) Text comparison method, device, medium and electronic equipment
CN112949430A (en) Video processing method and device, storage medium and electronic equipment
CN110674813B (en) Chinese character recognition method and device, computer readable medium and electronic equipment
CN115937888A (en) Document comparison method, device, equipment and medium
WO2023088378A1 (en) Information processing method and apparatus, terminal and storage medium
CN112084441A (en) Information retrieval method and device and electronic equipment
CN111260445A (en) House resource information display method, device, terminal and storage medium
CN110705536A (en) Chinese character recognition error correction method and device, computer readable medium and electronic equipment
CN110598049A (en) Method, apparatus, electronic device and computer readable medium for retrieving video
CN111782895B (en) Retrieval processing method and device, readable medium and electronic equipment
CN111783440B (en) Intention recognition method and device, readable medium and electronic equipment
CN114239501A (en) Contract generation method, apparatus, device and medium
CN110413603B (en) Method and device for determining repeated data, electronic equipment and computer storage medium
CN114495080A (en) Font identification method and device, readable medium and electronic equipment
CN111898595A (en) Information display method and device, electronic equipment and storage medium
CN112445478A (en) Graphic file processing method, device, equipment and medium
CN111353536A (en) Image annotation method and device, readable medium and electronic equipment
CN111984890B (en) Method, device, medium and electronic equipment for generating display information
CN112307245B (en) Method and apparatus for processing image
CN112948108B (en) Request processing method and device and electronic equipment
CN114647685B (en) Data processing method, device, equipment and medium
CN111026983B (en) Method, device, medium and electronic equipment for realizing hyperlink

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination