CN111090982A - Text comparison method and device, electronic equipment and computer readable storage medium - Google Patents

Text comparison method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN111090982A
CN111090982A CN201811248252.0A CN201811248252A CN111090982A CN 111090982 A CN111090982 A CN 111090982A CN 201811248252 A CN201811248252 A CN 201811248252A CN 111090982 A CN111090982 A CN 111090982A
Authority
CN
China
Prior art keywords
character
comparison result
array
text
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811248252.0A
Other languages
Chinese (zh)
Inventor
邓鹏�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maipu Communication Technology Co Ltd
Original Assignee
Maipu Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maipu Communication Technology Co Ltd filed Critical Maipu Communication Technology Co Ltd
Priority to CN201811248252.0A priority Critical patent/CN111090982A/en
Publication of CN111090982A publication Critical patent/CN111090982A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a text comparison method, a text comparison device, electronic equipment and a computer readable storage medium, wherein when two text files are compared, the two text files are compared based on a preset algorithm to obtain a data comparison result; and then when the line data comparison result comprises line data with different line data of the two text files, character comparison is carried out on the line data with different line data based on the preset algorithm to obtain a character comparison result, and finally the comparison result of the two text files is obtained according to the line data comparison result and the character comparison result, so that the problem that the two text files cannot be compared in the traditional scheme is solved. In addition, the scheme can be implanted into any operating system, and is high in compatibility.

Description

Text comparison method and device, electronic equipment and computer readable storage medium
Technical Field
The invention relates to the field of data processing, in particular to a text comparison method and device, electronic equipment and a computer readable storage medium.
Background
The traditional Needleman/Wunsch algorithm is a text comparison algorithm for the longest common substring, which can compare two input strings, and the granularity of comparison is to compute and compare the two strings one by one with a single text character, for example, in the algorithm, two input strings of GGATCGA and GAATTCAGTTA can obtain the comparison result of the two strings. However, this algorithm cannot compare two text files. In practical application scenarios, it is often necessary to compare two text files due to customer requirements, for example, it is necessary to know the similarities and differences between two configuration text files with respect to the same line. This requirement cannot be met by the conventional Needleman/Wunsch algorithm.
Disclosure of Invention
In view of the above, embodiments of the present invention provide a text comparison method, apparatus, electronic device and computer-readable storage medium to solve the above problems.
In a first aspect, an embodiment of the present invention provides a text comparison method, where the method includes: acquiring two text files to be compared; performing line data comparison on the two text files based on a preset algorithm to obtain a line data comparison result; when the line data comparison result comprises line data with different existence of the two text files, character comparison is carried out on the line data with different existence of the two text files based on the preset algorithm, and a character comparison result is obtained; and obtaining a comparison result of the two text files according to the line data comparison result and the character comparison result.
In a second aspect, an embodiment of the present invention provides a text comparison apparatus, where the apparatus includes: the device comprises an acquisition module and a comparison module. The acquisition module is used for acquiring two text files to be compared; the comparison module is used for comparing the two text files according to the line data based on a preset algorithm to obtain a line data comparison result; the comparison module is further used for comparing characters of the different line data based on the preset algorithm to obtain a character comparison result when the line data comparison result comprises the different line data of the two text files; the acquisition module is further configured to acquire a comparison result of the two text files according to the line data comparison result and the character comparison result. .
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the text comparison method according to any one of the embodiments of the first aspect.
In a fourth aspect, an embodiment of the present invention provides an electronic device, which includes a memory and a processor that are connected to each other, where the memory stores a computer program, and when the computer program is executed by the processor, the electronic device is caused to execute the text comparison method according to any one of the embodiments of the first aspect.
Compared with the prior art, the text comparison method, the text comparison device, the electronic equipment and the computer readable storage medium provided by the embodiments of the invention firstly compare two text files based on a preset algorithm to obtain a data comparison result when the two text files are compared; and then when the line data comparison result comprises line data with different line data of the two text files, character comparison is carried out on the line data with different line data based on the preset algorithm to obtain a character comparison result, and finally the comparison result of the two text files is obtained according to the line data comparison result and the character comparison result, so that the problem that the two text files cannot be compared in the traditional scheme is solved. In addition, the scheme can be implanted into any operating system, and is high in compatibility.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
FIG. 2 is a flowchart of a text comparison method according to a first embodiment of the present invention;
fig. 3 is a block diagram of a text comparison apparatus according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
First, the terms to which the present invention relates will be briefly described:
Needleman/Wunsch: a text comparison algorithm detects the length of the longest common sub-string of two strings.
The same row: the comparison result is the line corresponding to two identical character strings.
Different rows: the result of the comparison is that the two rows are not consistent, and the definition of the different rows has a limitation that the two rows involved in the comparison are both data rows and not empty rows.
Adding rows: two lines involved in the comparison, one being an empty string and one being a non-empty string, this data is represented as an added line.
In practical application scenarios, it is often encountered that two text files need to be compared due to customer requirements, for example, the difference between two configuration text files for the same line needs to be known. This requirement cannot be met by the conventional Needleman/Wunsch algorithm.
In order to solve the above problems, embodiments of the present invention provide a text comparison method and apparatus, and the text comparison technique can be implemented by using corresponding software, hardware, and a combination of software and hardware. The following describes embodiments of the present invention in detail.
First, an electronic device 100 for implementing the text comparison method and apparatus according to the embodiment of the present invention is described with reference to fig. 1. In the figure, the electronic device 100 may include a memory 110, a processor 120, and a text comparison apparatus.
The components of memory 110, processor 120, and text comparison device may be interconnected by a bus system and/or other form of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 1 are exemplary only, and not limiting, and the electronic device 100 may have other components and structures as desired. The text comparison device includes at least one software function module which can be stored in the memory 110 in the form of software or firmware (firmware) or is solidified in an Operating System (OS) of the electronic device 100. The processor 120 is configured to execute an executable module stored in the memory 110, such as a software function module or a computer program included in the text comparison apparatus.
The memory 110 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. On which one or more computer program instructions may be stored and executed by processor 120 to implement the functions desired in embodiments of the invention described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
The processor 120 may be an integrated circuit chip having signal processing capabilities. The Processor 120 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. Processor 120 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention.
The following description will be made of a text comparison method for performing comparison between two texts:
referring to fig. 2, fig. 2 is a flowchart of a text comparison method according to a first embodiment of the invention, and the method is applied to an electronic device 100. The method comprises the following steps:
step S110: two text files to be compared are obtained.
Wherein each text file comprises at least one line of character strings.
Step S120: and performing line data comparison on the two text files based on a preset algorithm to obtain a line data comparison result.
Wherein the preset algorithm may be a Needleman/Wunsch algorithm.
Since the Needleman/Wunsch algorithm can only recognize character strings to obtain a comparison result, before performing line data comparison on the two text files, each text file needs to be read and analyzed, and the text file needs to be converted into a format capable of being recognized by the Needleman/Wunsch algorithm.
As an alternative implementation, each text file may be converted into a character string array in a row unit, where the number of elements included in the character string array corresponds to the number of rows included in the text file corresponding to the character string array.
For example, text file a includes four lines of data, respectively:
qwert;
asdf;
zxcvb;
GGATCGA。
the text file B includes three lines of data, which are:
asdf;
zxcvb;
GAATTCAGTTA。
then, after conversion, the character string array corresponding to the text file a is: a: [ "qwert", "asdf", "zxcvb", "GGATCGA" ], the character string array corresponding to the text file B is: b: [ "asdf", "zxcvb", "GAATTCAGTTA" ].
And after the character string array corresponding to each text file is input, performing line comparison on the two text files by adopting a Needleman/Wunsch algorithm to obtain a line data comparison result.
And the row data comparison result comprises two rows of data which are the same and two rows of data which are different. Two lines of data are referred to as the same line, and the two lines of data are different and include: two cases of different rows and increasing rows.
Wherein, the different rows represent that the comparison result of the data of the two rows is inconsistent, and the two rows are not empty rows; the added rows represent that one of the two rows participating in the comparison is an empty string, i.e., an empty row, and the other row is a non-empty string, i.e., a non-empty row.
Also taking the text file a and the text file B as examples, the character string array corresponding to the text file a is: a: [ "qwert", "asdf", "zxcvb", "GGATCGA" ], the character string array corresponding to the text file B is: b: in the case of [ "asdf", "zxcvb", "GAATTCAGTTA" ], the obtained line data comparison results are shown in table 1:
TABLE 1
Text file A Text file B Row data comparison result
qwert Is free of Adding rows
asdf asdf Same row
zxcvb zxcvb Same row
GGATCGA GAATTCAGTTA Different rows
As another embodiment, each element in the character string array corresponding to each text file may be converted into a corresponding preset numerical value according to a first preset rule, so as to form a first numerical value array. And after the first numerical array is formed, comparing the first numerical array according to the preset algorithm to obtain a data comparison result.
The following introduces the conversion of the string array into the first numeric array:
firstly, each element included in the two character string arrays to be compared needs to be identified, and each element is converted into a single matched preset numerical value according to a first preset rule. The first preset rule may be that after repeated strings in the two string arrays are filtered to obtain a new string array, different strings included in the new string array are identified by different single numerical values, and the different numerical values belong to the same numerical value type. In the embodiment of the present invention, to reduce the memory consumption, int-type characters, that is, strings are identified in a short numerical manner.
The character string array corresponding to the text file A is as follows: a: [ "qwert", "asdf", "zxcvb", "GGATCGA" ], the character string array corresponding to the text file B is: b: in the case of [ "asdf", "zxcvb", "GAATTCAGTTA" ], after filtering repeated strings, the new string array obtained is: [ "qwert", "asdf", "zxcvb", "GGATCGA", "GAATTCAGTTA" ]. Subsequently, when short value is used for identification, starting from 1, the identification result is as follows: 1- "qwerty", 2- "asdf", 3- "zxcvb", 4- "GGATCGA", 5- "GAATTCAGTTA".
Based on the identification result, a mapping relationship between the character string and the numerical identification, i.e. a first preset rule, as shown in table 2 can be established. The mapping relation needs to be communicated in two directions, namely the identifier can be obtained through the character string, and the character string can also be obtained through the identifier.
TABLE 2
Figure BDA0001840411300000081
After the mapping relationship between the character string and the identifier is obtained, the character string array a corresponding to the text file a and the character string array B corresponding to the text file B may be converted into a first numerical array a and a first numerical array B, and then the first numerical array a and the first numerical array B are input and compared by using a Needleman/Wunsch algorithm to obtain the result shown in table 3. Wherein, the first numerical array a and the first numerical array B are respectively:
first numerical array A: [1,2, 3, 4]
The first numerical array B [2, 3, 5 ].
TABLE 3
First numerical array A First numerical array B Results
First number: 1 Is free of Adding rows
First number: 2 First number: 2 Same row
First number: 3 First number: 3 Same row
First number: 4 First number: 5 Different rows
After the row data comparison result shown in table 3 is obtained, the numeric identifier is inversely converted into a character string based on the mapping relationship between the character string and the numeric identifier, and the result shown in table 1 is finally obtained.
Step S130: and when the line data comparison result comprises that the two text files have different line data, character comparison is carried out on the different line data based on the preset algorithm to obtain a character comparison result.
The character comparison result may include two cases, that is, two characters are the same and two characters are different, corresponding to the row data comparison result. The two characters are called the same character identically, and the two characters are different and comprise: different characters, and adding characters.
Because the row data which are different for two rows comprise two cases of adding rows and different rows, wherein for the adding rows, one row which participates in comparison is empty, and the other row which participates in comparison comprises characters which are added characters compared with the empty row. Therefore, optionally, in step S130, only two rows corresponding to different rows in the row data comparison result may be subjected to character comparison, so as to obtain a character comparison result.
Therefore, when the different line data in the line data comparison result are compared with the characters based on the preset algorithm, the corresponding different two lines of data (different lines) can be sequentially obtained from the two text files according to the line data comparison result, each line in the two lines of data is converted into a character array by taking the characters as a unit, the number of elements included in the character array corresponds to the number of characters included in each line, and then the character arrays corresponding to the two lines of data are compared according to the preset algorithm.
Also taking the text file a and the text file B in the above as examples, the different lines in the text file a and the text file B are respectively the character string a corresponding to the text file a: GGATCGA, character string B corresponding to text file B: GAATTCAGTTA are provided. After conversion, the two character strings form a character array A respectively: the character array B [ 'G', 'G', 'A', 'T', 'C', 'A', 'G', 'T', 'T', 'A', 'A', 'D', 'C', 'D', 'C', 'T', 'A', 'D', 'C', 'D', 'C', and the like.
And inputting the character arrays corresponding to each row, and comparing by adopting the Needleman/Wunsch algorithm to perform character comparison on the different row data in the row data comparison result to obtain a character comparison result.
For example, in character array A: [ 'G', 'G', 'A', 'T', 'C', 'G', 'A' ], the character array B is: in the case of [ 'G', 'a', 'T', 'C', 'a', 'G', 'T', 'a' ], character comparison results obtained are shown in table 4:
TABLE 4
Character array A Character array B Character ratioComparing the results
G G Are identical to each other
G A Is different
A A Are identical to each other
Is free of T Increase of
C C Are identical to each other
Is free of A Increase of
G G Are identical to each other
Is free of T Increase of
Is free of T Increase of
A A Are identical to each other
Optionally, when comparing the character arrays corresponding to the two lines of data according to the preset algorithm, each element in the character arrays corresponding to the two lines of data may be converted into a corresponding preset numerical value according to a second preset rule to form a second numerical value array, and then the second numerical value array is compared according to the preset algorithm to obtain a character comparison result.
The step of converting each element in the character array corresponding to the two lines of data into a corresponding preset numerical value according to a second preset rule to form a second numerical value array is similar to the step of converting each element in the character string array corresponding to each text file into a corresponding preset numerical value according to a first preset rule to form a first numerical value array.
For character array A: [ 'G', 'G', 'A', 'T', 'C', 'G', 'A' ], the character array B is: in the case of [ 'G', 'a', 'T', 'C', 'a', 'G', 'T', 'a' ], after filtering the repeated characters, the resulting new character array is: [ 'G', 'A', 'T', 'C' ]. Subsequently, when short value is used for identification, starting from 1, the identification result is as follows: 1- "G", 2- "A", 3- "T", 4- "C".
Based on the identification result, a mapping relationship between the characters and the identification as shown in table 5 can be established.
TABLE 5
Figure BDA0001840411300000111
After the mapping relationship between the characters and the identifiers is obtained, the character array a and the character array B may be converted into a second numerical array a and a second numerical array B, and then the second numerical array a and the second numerical array B are input and compared by using a Needleman/Wunsch algorithm to obtain the results shown in table 6. Wherein, the second numerical array a and the second numerical array B are respectively:
a second numerical array A [1,1,2,3,4,1,2]
A second numerical array B [1,2,2,3,3,4,2,1,3,3,2 ].
TABLE 6
Second numerical array A Second numerical array B Results
First number: 1 First number: 1 Are identical to each other
Second number: 1 Second number: 2 Is different
Third number: 2 Third number: 2 Are identical to each other
Is free of Fourth number: 3 Increase of
Fourth number: 3 Fifth number: 3 Are identical to each other
Fifth number: 4 Sixth number: 4 Are identical to each other
Is free of Seventh number: 2 Increase of
Sixth number: 1 Eighth number: 1 Are identical to each other
Is free of The ninth number: 3 Increase of
Is free of The tenth number: 3 Increase of
Seventh number: 2 Eleventh number: 2 Are identical to each other
After the character comparison result shown in table 6 is obtained, the identifier is inversely converted into a character based on the mapping relationship between the character and the identifier, and the result shown in table 4 is finally obtained.
Step S140: and obtaining a comparison result of the two text files according to the line data comparison result and the character comparison result.
Of course, the result of the comparison of the two text files may also be output.
Optionally, when the character comparison result is output, HTML may be used to display the character comparison result to the user, and any other form that can be used to output the character comparison result, such as a picture on a webpage, a word, and the like, may also be used.
When outputting the character comparison result, the two lines with the same line can be used for outputting line numbers and data, the two lines with the added lines can be used for outputting line numbers and data, the data in the empty line is replaced by a blank space, the two lines with different lines can be used for calling the character string comparison template stored in advance to output two character string data, and the characters with difference are marked. Alternatively, the characters having differences may be marked with a different color, such as red.
The following description will be made by taking TABLE (TABLE) in HTML as an example to output the character comparison result shown in TABLE 1:
for the first three lines, TABLE for text file A is:
<tr><td>1</td><td style=’color:’red;’>qwert</td></tr>
<tr><td>2</td><td>asdf</td></tr>
<tr><td>3</td><td>zxcvb</td></tr>。
for the first three lines, TABLE for text file B is:
<tr><td>&nbsp;</td><td>&nbsp;</td></tr>
<tr><td>1</td><td>asdf</td></tr>
<tr><td>2</td><td>zxcvb</td></tr>。
wherein < tr > characterizes rows, < td > characterizes columns, < td style ═ color: ' red; ' > denotes red, & nsp denotes spaces.
In a pre-stored character string comparison template, if two identical characters exist, respectively outputting the two characters; if the characters are increased, outputting existing characters, and outputting nonexistent characters as null characters; if the characters are two different characters, the two characters are output respectively, and the characters are marked and output by red (of course, other colors are also possible).
Characters that need to be marked with red are denoted by < span style ═ color: and red: ' > < span > output, such as G output < span style ═ color: and red: ' > G </span >.
In HTML, _ represents a null character, output in & nbsp.
For example, for the last line of Table 1, text File A outputs: GGA _ TC _ G _ a, wherein the second character G is red; and (5) outputting the text file B: GAATTCAGTTA, wherein the second character a, the fourth character T, the seventh character a, the ninth character T and the tenth character T are all red.
In the text comparison method provided by the first embodiment of the present invention, when comparing two text files, the two text files are compared based on a preset algorithm to obtain a data comparison result; and then when the line data comparison result comprises line data with different line data of the two text files, character comparison is carried out on the line data with different line data based on the preset algorithm to obtain a character comparison result, and finally the comparison result of the two text files is obtained according to the line data comparison result and the character comparison result, so that the problem that the two text files cannot be compared in the traditional scheme is solved. In addition, the scheme can be implanted into any operating system, and is high in compatibility.
In addition, referring to fig. 3, a text comparison apparatus 400 is provided according to a second embodiment of the present invention. As will be explained below with respect to the block diagram shown in fig. 3, the apparatus may include an obtaining module 410 and a comparing module 420.
An obtaining module 410, configured to obtain two text files to be compared;
the comparison module 420 is configured to perform line data comparison on the two text files based on a preset algorithm to obtain a line data comparison result;
the comparison module 420 is further configured to, when the line data comparison result includes that the two text files have different line data, perform character comparison on the different line data based on the preset algorithm to obtain a character comparison result;
the obtaining module 410 is further configured to obtain a comparison result of the two text files according to the line data comparison result and the character comparison result.
Optionally, the comparing module 420 is configured to convert each text file into a character string array in a unit of a line, where the number of elements included in the character string array corresponds to the number of lines included in the text file corresponding to the character string array; converting each element in the character string array corresponding to each text file into a corresponding preset numerical value according to a first preset rule to form a first numerical value array; and comparing the first numerical value array according to the preset algorithm.
Optionally, the comparing module 420 is configured to obtain two corresponding lines of different data from the two text files in sequence according to the line data comparison result, and convert each line of the two lines of data into a character array with a character as a unit, where the number of elements included in the character array corresponds to the number of characters included in each line; and comparing the character arrays corresponding to the two lines of data according to the preset algorithm.
Optionally, the comparing module 420 is configured to convert each element in the character array corresponding to the two lines of data into a corresponding preset numerical value according to a second preset rule, so as to form a second numerical value array; and comparing the second numerical value array according to the preset algorithm.
Optionally, the comparing module 420 is configured to convert each text file into a character string array in a unit of a line, where the number of elements included in the character string array corresponds to the number of lines included in the text file corresponding to the character string array; converting each element in the character string array corresponding to each text file into a corresponding preset numerical value according to a first preset rule to form a first numerical value array; and comparing the first numerical value array according to the preset algorithm.
Optionally, the comparing module 420 is configured to obtain two corresponding lines of different data from the two text files in sequence according to the line data comparison result, and convert each line of the two lines of data into a character array with a character as a unit, where the number of elements included in the character array corresponds to the number of characters included in each line; and comparing the character arrays corresponding to the two lines of data according to the preset algorithm.
Optionally, the comparing module 420 is configured to convert each element in the character array corresponding to the two lines of data into a corresponding preset numerical value according to a second preset rule, so as to form a second numerical value array; and comparing the second numerical value array according to the preset algorithm.
In this embodiment, please refer to the contents described in the embodiments shown in fig. 1 to fig. 2 for the process of implementing each function of each functional module of the text comparison apparatus 400, which is not described herein again.
In addition, corresponding to the text comparison method in the first embodiment, an embodiment of the present application further provides an electronic device, which includes a memory and a processor that are connected to each other, where the memory stores a computer program, and when the computer program is executed by the processor, the electronic device is caused to perform the method described in any one of the first embodiments.
In addition, corresponding to the text comparison method in the first embodiment, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is read and executed by a processor, the computer program causes the processor to execute the method described in any one of the first embodiments.
In summary, in the text comparison method, the text comparison device, the electronic device, and the computer-readable storage medium according to the embodiments of the present invention, when comparing two text files, the two text files are compared based on a preset algorithm to obtain a data comparison result; and then when the line data comparison result comprises line data with different line data of the two text files, character comparison is carried out on the line data with different line data based on the preset algorithm to obtain a character comparison result, and finally the comparison result of the two text files is obtained according to the line data comparison result and the character comparison result, so that the problem that the two text files cannot be compared in the traditional scheme is solved. In addition, the scheme can be implanted into any operating system, and is high in compatibility.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of comparing text, the method comprising:
acquiring two text files to be compared;
performing line data comparison on the two text files based on a preset algorithm to obtain a line data comparison result;
when the line data comparison result comprises line data with different existence of the two text files, character comparison is carried out on the line data with different existence of the two text files based on the preset algorithm, and a character comparison result is obtained;
and obtaining a comparison result of the two text files according to the line data comparison result and the character comparison result.
2. The method of claim 1, wherein the performing line data comparison on the two text files based on a preset algorithm comprises:
converting each text file into a character string array in a row unit, wherein the number of elements included in the character string array corresponds to the number of rows included in the text file corresponding to the character string array;
converting each element in the character string array corresponding to each text file into a corresponding preset numerical value according to a first preset rule to form a first numerical value array;
and comparing the first numerical value array according to the preset algorithm.
3. The method according to claim 1 or 2, wherein the character comparison of the row data with different row data in the row data comparison result based on the preset algorithm comprises:
according to the row data comparison result, acquiring two corresponding lines of different data from the two text files in sequence, converting each line of the two lines of data into a character array by taking characters as a unit, wherein the number of elements included in the character array corresponds to the number of characters included in each line;
and comparing the character arrays corresponding to the two lines of data according to the preset algorithm.
4. The method according to claim 3, wherein comparing the character arrays corresponding to the two lines of data according to the preset algorithm comprises:
converting each element in the character array corresponding to the two lines of data into a corresponding preset numerical value according to a second preset rule to form a second numerical value array;
and comparing the second numerical value array according to the preset algorithm.
5. A text comparison apparatus, the apparatus comprising:
the acquisition module is used for acquiring two text files to be compared;
the comparison module is used for comparing the two text files according to the line data based on a preset algorithm to obtain a line data comparison result;
the comparison module is further used for comparing characters of the different line data based on the preset algorithm to obtain a character comparison result when the line data comparison result comprises the different line data of the two text files;
the acquisition module is further configured to acquire a comparison result of the two text files according to the line data comparison result and the character comparison result.
6. The apparatus according to claim 5, wherein the comparing module is configured to convert each text file into a string array in units of lines, and the number of elements included in the string array corresponds to the number of lines included in the text file corresponding to the string array; converting each element in the character string array corresponding to each text file into a corresponding preset numerical value according to a first preset rule to form a first numerical value array; and comparing the first numerical value array according to the preset algorithm.
7. The apparatus according to claim 5 or 6, wherein the comparing module is configured to obtain two corresponding lines of data that are different from each other from the two text files in sequence according to the line data comparison result, and convert each line of the two lines of data into a character array in a character unit, where the number of elements included in the character array corresponds to the number of characters included in each line; and comparing the character arrays corresponding to the two lines of data according to the preset algorithm.
8. The apparatus according to claim 7, wherein the comparing module is configured to convert each element in the character array corresponding to the two rows of data into a corresponding preset value according to a second preset rule, so as to form a second value array; and comparing the second numerical value array according to the preset algorithm.
9. An electronic device, comprising an interconnected memory, a processor, a computer program being stored in the memory, the computer program, when executed by the processor, causing the electronic device to perform the method of any of claims 1-4.
10. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to carry out a text comparison method according to any one of claims 1 to 4.
CN201811248252.0A 2018-10-24 2018-10-24 Text comparison method and device, electronic equipment and computer readable storage medium Pending CN111090982A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811248252.0A CN111090982A (en) 2018-10-24 2018-10-24 Text comparison method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811248252.0A CN111090982A (en) 2018-10-24 2018-10-24 Text comparison method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN111090982A true CN111090982A (en) 2020-05-01

Family

ID=70392201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811248252.0A Pending CN111090982A (en) 2018-10-24 2018-10-24 Text comparison method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111090982A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149402A (en) * 2020-09-23 2020-12-29 创新奇智(青岛)科技有限公司 Document comparison method and device, electronic equipment and computer-readable storage medium
CN112632952A (en) * 2020-12-08 2021-04-09 中国建设银行股份有限公司 Method and device for comparing files
CN113407665A (en) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 Text comparison method, device, medium and electronic equipment
CN114492369A (en) * 2022-01-26 2022-05-13 奇安信科技集团股份有限公司 Text comparison method and device, electronic equipment and computer readable storage medium

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1392497A (en) * 2002-07-24 2003-01-22 彭泉 Matching method for large character string
CN101807207A (en) * 2010-03-22 2010-08-18 北京大用科技有限责任公司 Method for sharing document based on content difference comparison
CN102063510A (en) * 2011-01-17 2011-05-18 珠海全志科技有限公司 Method for searching matched character string
CN103309893A (en) * 2012-03-15 2013-09-18 阿里巴巴集团控股有限公司 Character string comparing method and device
CN103440231A (en) * 2013-09-02 2013-12-11 北京网秦天下科技有限公司 Equipment and method for comparing texts
CN103605694A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method for detecting similar texts
CN105956064A (en) * 2016-04-28 2016-09-21 焦点科技股份有限公司 Sequence optimization method of custom element based on LCS (Longest Common Subsequence)
CN106033543A (en) * 2015-03-11 2016-10-19 株式会社理光 Document modification detecting method, original document manuscript providing device, duplicated document detecting device, and document modification detection system
CN106202007A (en) * 2016-06-28 2016-12-07 电子科技大学 A kind of appraisal procedure of MATLAB program file similarity
CN106326197A (en) * 2016-08-23 2017-01-11 达而观信息科技(上海)有限公司 Method for fast detecting repeated copying texts
CN106372040A (en) * 2016-08-24 2017-02-01 长园深瑞继保自动化有限公司 Difference comparison system of intelligent substation configuration file
CN106469219A (en) * 2016-09-09 2017-03-01 武汉长光科技有限公司 A kind of method that embedded device configuration file synchronously compares
CN106469186A (en) * 2016-08-29 2017-03-01 北京像素软件科技股份有限公司 A kind of method and device of character string comparison
CN106484771A (en) * 2016-09-09 2017-03-08 腾讯科技(深圳)有限公司 Different information file generated and application process, device
CN106484730A (en) * 2015-08-31 2017-03-08 北京国双科技有限公司 Character string matching method and device
CN108170805A (en) * 2017-12-28 2018-06-15 福建中金在线信息科技有限公司 A kind of tables of data comparative approach, device, electronic equipment and readable storage medium storing program for executing
CN108256587A (en) * 2018-02-05 2018-07-06 武汉斗鱼网络科技有限公司 Determining method, apparatus, computer and the storage medium of a kind of similarity of character string

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1392497A (en) * 2002-07-24 2003-01-22 彭泉 Matching method for large character string
CN101807207A (en) * 2010-03-22 2010-08-18 北京大用科技有限责任公司 Method for sharing document based on content difference comparison
CN102063510A (en) * 2011-01-17 2011-05-18 珠海全志科技有限公司 Method for searching matched character string
CN103309893A (en) * 2012-03-15 2013-09-18 阿里巴巴集团控股有限公司 Character string comparing method and device
CN103440231A (en) * 2013-09-02 2013-12-11 北京网秦天下科技有限公司 Equipment and method for comparing texts
CN103605694A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method for detecting similar texts
CN106033543A (en) * 2015-03-11 2016-10-19 株式会社理光 Document modification detecting method, original document manuscript providing device, duplicated document detecting device, and document modification detection system
CN106484730A (en) * 2015-08-31 2017-03-08 北京国双科技有限公司 Character string matching method and device
CN105956064A (en) * 2016-04-28 2016-09-21 焦点科技股份有限公司 Sequence optimization method of custom element based on LCS (Longest Common Subsequence)
CN106202007A (en) * 2016-06-28 2016-12-07 电子科技大学 A kind of appraisal procedure of MATLAB program file similarity
CN106326197A (en) * 2016-08-23 2017-01-11 达而观信息科技(上海)有限公司 Method for fast detecting repeated copying texts
CN106372040A (en) * 2016-08-24 2017-02-01 长园深瑞继保自动化有限公司 Difference comparison system of intelligent substation configuration file
CN106469186A (en) * 2016-08-29 2017-03-01 北京像素软件科技股份有限公司 A kind of method and device of character string comparison
CN106469219A (en) * 2016-09-09 2017-03-01 武汉长光科技有限公司 A kind of method that embedded device configuration file synchronously compares
CN106484771A (en) * 2016-09-09 2017-03-08 腾讯科技(深圳)有限公司 Different information file generated and application process, device
CN108170805A (en) * 2017-12-28 2018-06-15 福建中金在线信息科技有限公司 A kind of tables of data comparative approach, device, electronic equipment and readable storage medium storing program for executing
CN108256587A (en) * 2018-02-05 2018-07-06 武汉斗鱼网络科技有限公司 Determining method, apparatus, computer and the storage medium of a kind of similarity of character string

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112149402A (en) * 2020-09-23 2020-12-29 创新奇智(青岛)科技有限公司 Document comparison method and device, electronic equipment and computer-readable storage medium
CN112149402B (en) * 2020-09-23 2023-05-23 创新奇智(青岛)科技有限公司 Document matching method, device, electronic equipment and computer readable storage medium
CN112632952A (en) * 2020-12-08 2021-04-09 中国建设银行股份有限公司 Method and device for comparing files
CN113407665A (en) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 Text comparison method, device, medium and electronic equipment
CN114492369A (en) * 2022-01-26 2022-05-13 奇安信科技集团股份有限公司 Text comparison method and device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN111090982A (en) Text comparison method and device, electronic equipment and computer readable storage medium
WO2017137000A1 (en) Method, device and apparatus for combining different instances describing same entity
US20120189203A1 (en) Associating captured image data with a spreadsheet
Bilotta et al. A new approach to cross-bifix-free sets
EP3179382A1 (en) Two-dimensional filter generation method, query method and device
CN108717461B (en) Mass data structuring method and device, computer equipment and storage medium
CN105279139A (en) Form information display rule configuration and calculation method and system
WO2016056236A1 (en) Information processing device, information processing method, and recording medium
US10176392B2 (en) Optical character recognition
JP6638739B2 (en) Analysis method for tabular data, analysis program for tabular data, and information processing apparatus
Hindman et al. The First Nontrivial Hales-Jewett Number is Four.
CN106484753B (en) Data processing method
CN112069822A (en) Method, device and equipment for acquiring word vector representation and readable medium
CN106294779B (en) Personal brand label generation method and system
CN112149402B (en) Document matching method, device, electronic equipment and computer readable storage medium
US9747260B2 (en) Information processing device and non-transitory computer readable medium
CN109492195B (en) Font loading method and device, terminal and storage medium
CN110276050B (en) Method and device for comparing high-dimensional vector similarity
JP2013218627A (en) Method and device for extracting information from structured document and program
US9164976B2 (en) Method and system for automatically generating variable sequence data, computer program product and computer readable medium
Subramani A further modification on linear systematic sampling for finite populations
Torres-Moreno Trivergence of Probability Distributions, at glance
US10817759B2 (en) Image processing apparatus
US9602130B2 (en) System and method for matching a regular expression or combination of characters
Paravati Pedestrian Detection and Tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200501

RJ01 Rejection of invention patent application after publication