CN113836092B - File comparison method, device, equipment and storage medium based on RPA and AI - Google Patents

File comparison method, device, equipment and storage medium based on RPA and AI Download PDF

Info

Publication number
CN113836092B
CN113836092B CN202111138084.1A CN202111138084A CN113836092B CN 113836092 B CN113836092 B CN 113836092B CN 202111138084 A CN202111138084 A CN 202111138084A CN 113836092 B CN113836092 B CN 113836092B
Authority
CN
China
Prior art keywords
file
difference
text
comparison
position information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111138084.1A
Other languages
Chinese (zh)
Other versions
CN113836092A (en
Inventor
赵鹏
汪冠春
胡一川
褚瑞
李玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Laiye Network Technology Co Ltd
Laiye Technology Beijing Co Ltd
Original Assignee
Beijing Laiye Network Technology Co Ltd
Laiye Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Laiye Network Technology Co Ltd, Laiye Technology Beijing Co Ltd filed Critical Beijing Laiye Network Technology Co Ltd
Priority to CN202111138084.1A priority Critical patent/CN113836092B/en
Priority to PCT/CN2021/131627 priority patent/WO2023045053A1/en
Publication of CN113836092A publication Critical patent/CN113836092A/en
Application granted granted Critical
Publication of CN113836092B publication Critical patent/CN113836092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The embodiment of the invention discloses a file comparison method, device, equipment and storage medium based on RPA and AI. The method comprises the following steps: s1, receiving a reference file and a comparison file uploaded by an RPA robot of robot process automation; s2, the reference file and the comparison file are sent to a server; s3, receiving a difference comparison result of the comparison file sent by the server relative to the reference file; s4, according to the difference comparison result, highlighting a difference text in the comparison file and/or the reference file, wherein the difference text highlighted in the comparison file is a text with a difference between the comparison file and the reference file, and the difference text highlighted in the reference file is a text with a difference between the reference file and the comparison file. Through the scheme, automation of file comparison can be realized, and differences between two files can be highlighted.

Description

File comparison method, device, equipment and storage medium based on RPA and AI
Technical Field
The embodiment of the invention relates to the technical field of process automation, in particular to a file comparison method, device, equipment and storage medium based on RPA and AI.
Background
RPA (Robotic Process Automation, robot process automation) is a process task automatically executed according to rules by a specific "robot software" that simulates the operation of a human on a computer.
AI (ARTIFICIAL INTELLIGENCE ) is a new technical science to study, develop theories, methods, techniques and application systems for simulating, extending and expanding human intelligence.
RPA has unique advantages: low code, non-intrusive. The low code means that the RPA can operate without a very high IT level, and business personnel without programming can develop the flow; non-intrusive means that the RPA can simulate human operation without a software system open interface. However, conventional RPA has certain limitations: can only be based on fixed rules and application scenarios are limited. With the continuous development of the AI technology, the limitations of the traditional RPA are overcome by the deep fusion of the RPA and the AI, and the values of the labor force are greatly changed by the RPA+AI=hand work+head work.
In daily work, it is often necessary to compare two versions of a contract, legal, etc. file to determine what changes have occurred in the newly created file relative to the original file. However, currently, when the files are compared, two files to be compared need to be manually acquired, and then the difference between the manual comparison and the manual marking is carried out. When more files need to be compared or the number of pages of the files to be compared is more, the repeated low-value file comparison labor is needed by staff, so that a large amount of working time is occupied, and the working efficiency is low.
Disclosure of Invention
The embodiment of the invention provides a file comparison method, device, equipment and storage medium based on RPA and AI, which not only can realize automation of file comparison, but also can highlight the difference between two files, thereby improving the efficiency of searching file difference by a user.
In a first aspect, the present invention provides a file comparison method based on RPA and AI, where the method is applied to a client, and the method includes:
S1, receiving a reference file and a comparison file uploaded by an RPA robot of robot process automation;
S2, the reference file and the comparison file are sent to a server;
S3, receiving a difference comparison result of the comparison file sent by the server relative to the reference file;
S4, according to the difference comparison result, highlighting a difference text in the comparison file and/or the reference file, wherein the difference text highlighted in the comparison file is a text with a difference between the comparison file and the reference file, and the difference text highlighted in the reference file is a text with a difference between the reference file and the comparison file.
Optionally, the difference comparison result includes at least one piece of difference information, each piece of difference information includes a difference type, a difference text in the reference file, a difference text in the comparison file, difference position information of the difference text in the reference file, and difference position information of the difference text in the comparison file, where the difference position information includes a page identifier of a page to which the difference text belongs and coordinate information of the page to which the difference text belongs.
Optionally, the S4 includes:
s41, converting the coordinate information into dividing DIV element position information;
S42, when the DIV element position information enters a display area of the affiliated file, highlighting a difference text at the DIV element position information in the paging indicated by the paging identification according to a difference type corresponding to the DIV element position information and the paging identification corresponding to the DIV element position information.
Optionally, the S4 further includes:
S43, aiming at the same piece of difference information, generating an identity card identification number ID according to DIV element position information of a difference text in the reference file, DIV element position information of a difference text in the comparison file and a difference type, and respectively binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file;
S44, when a first synchronous positioning instruction triggered based on the reference file or the comparison file is received, highlighting the difference text synchronization at all DIV element position information bound with the ID corresponding to the first synchronous positioning instruction.
Optionally, after the step S3, the method further includes:
S5, displaying a difference detail in a preset display area according to the difference comparison result, wherein the preset display area is an area except a reference file display area and a comparison file display area, and the difference detail comprises a difference type in each piece of difference information, a difference text in the reference file and a difference text in the comparison file.
Optionally, after the step S43, the method further includes:
s45, binding the ID with corresponding difference information in the difference detail;
S46, when a second synchronous positioning instruction triggered based on the difference detail is received, acquiring an ID (identity) bound by difference information in the difference detail corresponding to the second synchronous positioning instruction;
S47, highlighting the difference text synchronization at the position information of all DIV elements bound with the acquired ID.
Optionally, before the step S4, the method further includes:
s6, receiving a scrolling instruction aiming at a first scroll bar, wherein the first scroll bar comprises a scroll bar of a reference file display area or a scroll bar of a comparison file display area;
S7, determining the proportion of the current rolled length of the first rolling bar to the total length of the rolling area according to the rolling instruction;
S8, rolling a second rolling bar according to the proportion so that the first rolling bar and the second rolling bar synchronously roll, wherein the second rolling bar comprises a rolling bar of a reference file display area or a rolling bar of a comparison file display area, but is different from the first rolling bar.
Optionally, the S4 includes:
and according to the difference comparison result, highlighting the difference text currently scrolled to the display area in the comparison file and/or the reference file.
Optionally, the S2 includes:
s21, recognizing the reference file and the comparison file by utilizing optical character recognition OCR to obtain at least one page text of the reference file and at least one page text of the comparison file;
S22, when a target file is a file containing multiple pages of texts, splicing the multiple pages of texts of the target file into a continuous context page text to obtain a target text, and when the target file is a file containing a single page of text, acquiring the single page of text from the target file as the target text, wherein when the target file is the reference file, the target text is the reference text, and when the target file is the comparison file, the target text is the comparison text;
S23, the reference text and the comparison text are sent to the server.
In a second aspect, an embodiment of the present invention provides a file comparison apparatus based on RPA and AI, where the apparatus is applied to a client, and the apparatus includes:
The receiving unit is used for receiving the reference file and the comparison file uploaded by the robot flow automatic RPA robot;
the sending unit is used for sending the reference file and the comparison file to a server;
the receiving unit is further configured to receive a difference comparison result of the comparison file sent by the server with respect to the reference file;
And the display unit is used for highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the text with the difference relative to the reference file, and the difference text highlighted in the reference file is the text with the difference relative to the reference file.
Optionally, the difference comparison result includes at least one piece of difference information, each piece of difference information includes a difference type, a difference text in the reference file, a difference text in the comparison file, difference position information of the difference text in the reference file, and difference position information of the difference text in the comparison file, where the difference position information includes a page identifier of a page to which the difference text belongs and coordinate information of the page to which the difference text belongs.
Optionally, the display unit includes:
the conversion module is used for converting the coordinate information into the dividing DIV element position information;
And the display module is used for highlighting the difference text at the DIV element position information in the paging indicated by the paging mark according to the difference type corresponding to the DIV element position information and the paging mark corresponding to the DIV element position information when the DIV element position information enters the display area of the affiliated file.
Optionally, the display unit further includes:
The generation module is used for generating an identity card identification number ID according to DIV element position information of a difference text in the reference file, DIV element position information of the difference text in the comparison file and a difference type aiming at the same piece of difference information;
the binding module is used for respectively binding the ID with DIV element position information of the difference text in the reference file and DIV element position information of the difference text in the comparison file;
And the first synchronization module is used for highlighting the difference text synchronization at all DIV element position information bound with the ID corresponding to the first synchronization instruction when the first synchronization positioning instruction triggered based on the reference file or the comparison file is received.
Optionally, the display unit is further configured to display, after receiving a difference comparison result of the comparison file sent by the server with respect to the reference file, a difference detail in a preset display area according to the difference comparison result, where the preset display area is an area other than the reference file display area and the comparison file display area, and the difference detail includes a difference type in each piece of difference information, a difference text in the reference file, and a difference text in the comparison file.
Optionally, the binding module is further configured to bind the ID with corresponding difference information in the difference detail after binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file respectively;
the display unit further includes:
The acquisition module is used for acquiring an ID bound by the difference information in the difference detail corresponding to the second synchronous positioning instruction when receiving the second synchronous positioning instruction triggered based on the difference detail;
And the second synchronization module is used for synchronizing and highlighting the difference text at the position information of all DIV elements bound with the acquired ID.
Optionally, the receiving unit is further configured to receive a scroll instruction for a first scroll bar, where the first scroll bar includes a scroll bar of a reference file display area or a scroll bar of a reference file display area, before highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result;
The determining unit is used for determining the proportion of the current rolled length of the first rolling bar to the total length of the rolling area according to the rolling instruction;
And the synchronous scrolling unit is used for scrolling a second scroll bar according to the proportion so that the first scroll bar and the second scroll bar synchronously scroll, and the second scroll bar comprises a scroll bar of a reference file display area or a scroll bar of a comparison file display area, but is different from the first scroll bar.
Optionally, the display unit is configured to highlight, according to the difference comparison result, a difference text currently scrolled to a display area in the comparison file and/or the reference file.
Optionally, the sending unit includes:
The recognition module is used for recognizing the reference file and the comparison file by utilizing optical character recognition OCR to obtain at least one page text of the reference file and at least one page text of the comparison file;
The splicing module is used for splicing the multi-page text of the target file into a context continuous one-page text when the target file is a file containing multi-page text, obtaining the target text, and obtaining a single-page text from the target file as the target text when the target file is a file containing the single-page text, wherein the target text is a reference text when the target file is the reference file, and the target text is a comparison text when the target file is the comparison file;
And the sending module is used for sending the reference text and the comparison text to the server.
In a third aspect, embodiments of the present invention provide a computing device, the computing device comprising:
one or more processors;
storage means for storing one or more programs,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of the first aspect.
In a fourth aspect, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in the first aspect.
According to the file comparison method, device, equipment and storage medium based on the RPA and the AI, the reference file and the comparison file to be compared can be automatically uploaded to the client by the RPA robot, the client transmits the reference file and the comparison file to the server for differential comparison, and finally the differential text can be highlighted in the comparison file and/or the reference file according to the differential comparison result returned by the server. Compared with the prior art requiring manual comparison of files, the file comparison method and device can automatically trigger the client to send two files to be compared to the server by using the RPA robot to perform automatic comparison, so that manpower can be saved, personnel who originally need to do file comparison can do more valuable work in time, and the file comparison efficiency can be improved; compared with the prior art requiring manual marking of the difference, the embodiment of the invention can directly highlight the difference text in the reference file and/or the comparison file, thereby improving the readability of the difference text and further improving the efficiency of searching the difference between the two files by a user. When the client sends the reference file and the comparison file to the server, the reference file and the comparison file can be identified by utilizing OCR (Optical Character Recognition, optical character identification), then the files containing the multi-page text in the two files are subjected to text splicing to obtain a single-page continuous reference text and a single-page continuous comparison text, and finally the reference text and the comparison text are sent to the server for differential comparison, so that the server can directly connect the contexts to compare the two texts without other processing of the server, and further the efficiency and the accuracy of the server for comparing the files can be improved.
In addition, the technical effects that can be realized by the embodiment of the invention include:
1. The user can trigger the synchronous positioning instruction through the reference file display area, the comparison file display area or the difference detail display area, so that the client side can synchronously highlight the same piece of difference information, and the efficiency of viewing the difference text by the user is improved.
2. The user can synchronously scroll the two display areas by dragging the scroll bar of the reference file display area or the comparison file display area, so that the efficiency of the user for viewing the text is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a file comparison method based on RPA and AI provided by an embodiment of the invention;
FIG. 2 is an exemplary diagram showing a differential comparison result provided by an embodiment of the present invention;
FIG. 3 is another exemplary diagram showing a differential comparison result provided by an embodiment of the present invention;
FIG. 4 is a block diagram of an RPA and AI-based file comparison apparatus according to an embodiment of the present invention;
FIG. 5 is a diagram of an RPA and AI-based file alignment system architecture according to an embodiment of the present invention;
fig. 6 is a diagram of another document alignment system architecture based on RPA and AI according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "comprising" and "having" and any variations thereof in the embodiments of the present invention and the accompanying drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
In daily work, manual comparison of different versions of files is often needed, the work is strong in repeatability and low in difficulty, time is consumed, and further the demand of companies on automatically comparing files is urgent. The RPA (Robotic Process Automation, robot process automation) technology can intelligently understand the existing application of the electronic equipment through a user use interface, and automate repeated regular and massive conventional operations, such as automatically and repeatedly reading mails, reading Office components, operating databases, webpages, client software and the like, collect data and perform complicated calculation, and generate required files and reports in batches, so that the investment of labor cost can be greatly reduced through the RPA technology, and the Office efficiency is effectively improved. AI (ARTIFICIAL INTELLIGENCE ) technology can break through fixed rules, simulate human thinking and awareness to automatically process some more complex application scenarios. Based on the above, the embodiment of the invention provides an automatic file comparison method combining the RPA technology and the AI technology, so that not only can the manpower be saved and the efficiency of file comparison be improved, but also the difference between the two files can be highlighted and the efficiency of searching the file difference by a user is improved.
The following describes embodiments of the present invention in detail.
In the description of the embodiment of the present invention, the term "reference file" refers to a file that is used as a reference during differential comparison, and the term "comparison file" refers to a file that is used as a reference and is not used as a reference in two files that are compared, where in practical application, the version of the reference file is often lower than that of the comparison file, and the reference file and the comparison file may be files in any field, such as contract files, financial files, program files, and the like.
In the description of the embodiments of the present invention, the term "multi-page file" refers to a file of text content of greater than or equal to two pages, and the term "multi-page text" refers to text of greater than or equal to two pages.
In the description of the embodiments of the present invention, the term "OCR" refers to optical character recognition (Optical Character Recognition), specifically refers to a process in which an electronic device checks characters printed on paper, determines their shapes by detecting dark and light patterns, and then translates the shapes into computer characters by a character recognition method; that is, the technology of converting the characters in the paper document into the image file of black-white lattice by optical mode and converting the characters in the image into the text format by the recognition software for further editing and processing by the word processing software is adopted. In the embodiment of the invention, characters in a paper document can be converted into a black-and-white dot matrix image file by utilizing an OCR technology based on an RPA robot, then text content contained in the image file is identified from the image file by utilizing the OCR technology by a client, or the text content can be acquired from the paper document by utilizing the OCR technology based on the RPA robot to generate a text file (namely an editable file) containing the text content, and then the text content is directly extracted from the text file by the client.
In the description of the embodiments of the present invention, the term "client" refers to a front end of a service system having a file comparison requirement, and "server" refers to a back end of a service system having a file comparison requirement. The client can be application software corresponding to the service system or a browser, so that the RPA robot accesses a website of the service system through the browser. The term "RPA robot" can be integrated in a client, can be embedded in the client in the form of a plug-in, and the like, and can also be mutually independent from the client, as long as the RPA robot can automatically access the client, and the specific form of the RPA robot is not limited in the embodiment of the invention.
In describing embodiments of the present invention, the term "NLP" refers to natural language processing (Natural Language Processing) that uses computer technology to analyze, understand, and process a subject of natural language, i.e., a computer is a powerful tool for language research, quantitatively researching language information with the support of the computer, and providing language descriptions that can be used together between humans and computers.
In the description of embodiments of the present invention, the term "stitching" refers to connecting together the content to be stitched without changing the original content. By splicing the multi-page text, the multi-page text content can be seamlessly connected on the basis of keeping the original text content arrangement sequence.
In the description of the embodiment of the present invention, the term "preset comparison algorithm" refers to a specific comparison method for determining a difference between a comparison text and a reference text, and the reference text and the comparison text may be compared in batches according to a preset comparison unit until the comparison is completed, and the specific comparison process may be described in detail in S120. The term "preset comparing unit" refers to the size of the text to be compared each time, and may be a phrase, a sentence, a paragraph, or the like, according to practical situations.
In the description of embodiments of the present invention, the term "differential alignment" refers to which differences exist between the reference text and the alignment text. The term "differential comparison result" refers to a result obtained after differential comparison of a reference text and a comparison text, wherein each piece of differential information comprises at least one piece of differential information, each piece of differential information comprises a differential type, a differential text in the reference file, a differential text in the comparison file, differential position information of the differential text in the reference file and differential position information of the differential text in the comparison file, and the differential position information comprises a page identification of a page to which the differential text belongs and coordinate information of the differential text in the page to which the differential text belongs. The term "difference type" is used to characterize the category of differences, mainly including content deletion, content addition, and content modification. The term "page identification" is used to indicate what page is the entire file the current page is located on. For the term "coordinate information", a coordinate system may be established for each page, with the first character position of each page as the origin, and the horizontal and vertical axes, respectively, right and left, so that corresponding coordinates may be generated for each character in the page. The term "difference text" refers to text content in a current file that differs from another file.
In the description of embodiments of the present invention, the term "highlighting" is a manner of display that can distinguish the difference text from other text significantly, including without limitation, combinations of one or more of the following: bold font, change font color, increase font ground color, highlight font, increase font, change to italics, increase underlining, increase strikethrough, etc.
In the description of the implementation of the present invention, the term "authentication" refers to verifying whether a client that transmits a reference file and a comparison file has a right to perform file comparison, and specifically, authentication may be implemented by verifying whether user information of the client meets the requirement of the right.
In the description of the implementation of the present invention, the term "DIV element location information" refers to the location information of DIV (DIVision, partition) elements at a network interface, and the term "DIV element" is used to provide elements of structure and context for block-level (block-level) content within an HTML (Hyper Text Markup Language) document.
In the description of the implementation of the present invention, the term "binding" refers to establishing a mapping relationship of at least two parameters to be bound, such that one parameter can be found by another parameter.
In the description of the implementation of the present invention, the term "difference detail" is specific description information for each difference, and the difference detail includes a difference type in each piece of difference information, a difference text in a reference file, and a difference text in an alignment file.
In the description of the implementation of the present invention, the term "synchronous positioning instruction" is an instruction for indicating that the difference text related to the same piece of difference information is displayed synchronously.
In the description of the implementation of the present invention, the term "synchronous scrolling" refers to a scrolling manner in which the progress of scrolling of at least two scroll bars is kept consistent.
Fig. 1 is a file comparison method based on RPA and AI, provided by an embodiment of the present invention, where the method is mainly applied to a client, and specifically includes:
s100, receiving the reference file and the comparison file uploaded by the RPA robot.
Specifically, the embodiment of the invention can configure the RPA program (which can be integrated with or embedded into the client or can be independent of the client) in the electronic equipment capable of logging in the client, so that the electronic equipment can simulate the mouse and keyboard operation of a user to automatically log in the client according to the rule set in the RPA program, trigger the client to generate a file comparison request comprising the reference file and the comparison file through accessing the client, and send the file comparison request to the server, thereby enabling the server to compare the difference between the reference file and the comparison file. When a client is logged in, the client can pop up a login interface containing the verification code image, in this case, the RPA robot can perform OCR (optical character recognition) on the verification code image to obtain the verification code content in the verification code image, and input the verification code content into a corresponding editing box, so that the client is successfully logged in.
The reference file and the comparison file can be stored in the client, can be stored in other storage spaces of the electronic equipment, and can be paper files. When the file is stored in other storage spaces of the electronic device, the RPA robot can search the reference file and the comparison file from the other storage spaces and upload the reference file and the comparison file to the client, for example, the two files are uploaded to the client by clicking an upload button, or the two files are dragged to a designated area by a dragging mode to realize file uploading, or other uploading modes are adopted. When the reference file and/or the comparison file is a paper file, the RPA robot may first convert the paper file into an image file or into a text file (i.e., an editable file composed of text contents in the paper file) by using the OCR technology, and then upload the image file or the text file to the client by using the above method.
When the client receives the reference file and the comparison file uploaded by the RPA, the reference file and the comparison file can be rendered so as to show the uploaded file to the user. Specifically, when the reference file and/or the comparison file is a word file, the word file can be converted into a PDF file, then rendering is performed by using a rendering library of the client, and when the PDF file is a multi-page file, multi-page rendering is performed; when the reference file and/or the comparison file are/is picture files except tiff format, rendering can be performed by using a rendering library carried by the client; when the reference file and/or the comparison file is a picture file in tiff format, a proprietary rendering library in tiff format may be utilized for rendering. When the word file is converted into the PDF file, the PDF file can be transmitted to the server by the client to execute conversion operation and then fed back to the client by the server to be rendered.
S110, the reference file and the comparison file are sent to a server.
After receiving the reference file and the comparison file uploaded by the RPA robot, the client can receive a file comparison instruction triggered by the RPA robot, then directly generate a file comparison request comprising the reference file and the comparison file according to the file comparison instruction, and send the file comparison request to the server so that the server can compare the reference file and the comparison file differently. However, after the server receives the reference file and the comparison file, the server needs to identify the text in the two files to perform differential comparison, and if more clients send file comparison requests to the server, the efficiency of the server in performing file comparison is reduced. In order to reduce the burden of a server and improve the file comparison efficiency, the embodiment of the invention can firstly identify the reference file and the comparison file by utilizing OCR by a client to obtain at least one page text of the reference file and at least one page text of the comparison file, and then send the identified text to the server for differential comparison.
In practical application, if at least one page text of the reference file is directly compared with at least one page text of the comparison file in a single page manner, namely, the nth page of the reference file is compared with the nth page of the comparison file, and the association relationship among the pages is not concerned, the situation that the comparison result is inaccurate is easy to occur. For example, the reference file includes two texts, and a single-page comparison method is adopted to compare the two texts, so that the second text of the reference file is different from the second text of the comparison file, and the reference file does not have a third text, so that the third text of the comparison file does not exist in the reference file, that is, the single-page comparison method is adopted, and the overall comparison result is that the two files are different except the first text.
In order to avoid the problem of inaccurate comparison results, the embodiment of the invention identifies the reference file and the comparison file by utilizing OCR at the client, and after at least one page text of the reference file and at least one page text of the comparison file are obtained, firstly performs text splicing, and then sends the spliced text to the server. Specifically, when a target file is a file containing multiple pages of texts, splicing the multiple pages of texts of the target file into a continuous context page text to obtain a target text, and when the target file is a file containing a single page of text, acquiring the single page of text from the target file as the target text, wherein the target file comprises a reference file or a comparison file, when the target file is the reference file, the target text is the reference text, and when the target file is the comparison file, the target text is the comparison text; and sending the reference text and the comparison text to the server.
Wherein, the context continuous means that the sequence of the original characters is kept. The specific method for splicing the multi-page text of the reference file or the comparison file into the context continuous one-page text can be to splice the multi-page text in sequence according to the paging sequence of the reference file or the comparison file, so as to obtain the context continuous one-page text.
It should be added that, in order to improve the communication security between the client and the server, the server may authenticate the user information of the client to verify whether the user has the file comparison authority. Specifically, when the client sends the reference file and the comparison file to the server, the client can also carry the user information of the client, so that the server authenticates the client according to the user information, and when the authentication is determined to pass, the reference file and the comparison file are subjected to differential comparison. The user information can be a client account number, a mobile phone number bound with the client account number, a user grade or other information, and the specific content of the user information is not limited and can be determined according to specific conditions. There are a variety of methods for authenticating user information, including but not limited to the following: (1) Matching the user information with a user list with authority, if the matching is successful, determining that the user corresponding to the user information has authority, namely authentication is passed, and if the matching is failed, determining that the user corresponding to the user information has no authority, namely authentication is failed; (2) Judging whether the user grade in the user information exceeds a preset grade, if so, passing authentication, and if not, failing authentication.
S120, receiving a difference comparison result of the comparison file sent by the server relative to the reference file.
After receiving the reference file and the comparison file, the server can compare the reference file with the comparison file in a different way according to a preset comparison algorithm. Specifically, the reference text and the comparison text may be compared according to a preset comparison unit, so as to obtain a comparison pair result for each preset comparison unit. In the process of comparing the reference text and the comparison text according to the preset comparison unit, if the content of the reference sub-text being compared (the reference text of the preset comparison unit) is identical to that of the comparison sub-text (the comparison text of the preset comparison unit), the corresponding comparison sub-result is determined to be identical in content; if the reference sub-text which is being compared is determined to not exist in the comparison text, determining the corresponding comparison sub-result as content deletion; if it is determined that the aligned pair text does not exist in the reference text, the corresponding aligned pair result is determined as the content increase. In practical applications, the difference between two texts should include content modification in addition to content identity, content deletion and content addition. Therefore, in order to enable the user to more intuitively see the difference of the comparison text relative to the reference text, the comparison pair results between the first comparison pair result and the second comparison pair result can be combined into one comparison pair result if the first comparison pair result and the second comparison pair result are the same in content and the comparison pair results between the first comparison pair result and the second comparison pair result comprise content deletion and content addition and do not comprise the same in content, and the combined comparison pair results are content modification. The size of the preset comparison unit can be determined according to practical situations, and can be a phrase, a sentence, a paragraph and the like.
When the two texts are compared, the method can be used for judging whether the characters or the words used by the text content are the same or not simply, and can be used for carrying out semantic analysis on the reference sub-text and the comparison sub-text by combining an NLP technology, and when the reference sub-text and the comparison sub-text have the same meaning but the used characters or words are different, the corresponding comparison sub-text can be determined to be the same in content. In addition, the embodiment of the invention can also support the self-defined filtering rule, and ignore meaningless differences, namely, when differences meeting the preset filtering rule exist in the differences between the reference sub-text and the comparison sub-text, the differences meeting the preset filtering rule are ignored. For example, it may be set that the presence or absence of a sentence "does not affect the comparison result. When the server sends the difference comparison result to the client, the server can also send the ignored difference, so that the client can display the ignored difference to the user.
When the comparison sub-result is a content addition, a content deletion or a content modification, one piece of difference information may be generated for the comparison sub-result so that after all the difference information is obtained, all the difference information is fed back to the client. Each piece of difference information comprises a difference type, a difference text in the reference file, a difference text in the comparison file, difference position information of the difference text in the reference file and difference position information of the difference text in the comparison file, wherein the difference position information comprises a page identification of a page to which the difference text belongs and coordinate information of the page to which the difference text belongs. A comparison sub-result corresponds to a piece of difference information, and the difference types comprise content addition, content deletion and content modification. The page identification is used to indicate what page is in the entire file. For the coordinate information, a coordinate system may be established for each page, with the first character position of each page as the origin, and the horizontal and vertical axes, respectively, right and left, so that corresponding coordinates may be generated for each character in the page.
In one embodiment, after the client sends the reference file and the comparison file to the server, the server may add the task of comparing the reference file and the comparison file to the task queue, store the task state of the comparison task and the task state of the comparison task to the task database, and update the task state in the task database in time when the task state of the comparison task changes. The client can receive a comparison task state query instruction triggered by the RPA robot and send the comparison task state query instruction to the server, so that the server queries the task state of the comparison task corresponding to the comparison task state query instruction from the task database and feeds back the queried task state to the client. Wherein, when the comparison task is not executed, the task state may be unprocessed, when the comparison task is being executed, the task state may be in process, and when the execution of the comparison task is completed, the task state may be completed.
In addition, the server can actively feed back the differential comparison result to the client, and can also passively feed back the differential comparison result to the client. The specific implementation manner of passively feeding back the difference comparison result to the client may be: the client receives a comparison result query instruction triggered by the RPA robot, sends the comparison result query instruction to the server, and sends a corresponding difference comparison result to the client according to the comparison result query instruction.
The specific implementation mode that the RPA robot triggers the client to send the comparison result query instruction or the comparison task state query instruction includes, but is not limited to, that the RPA robot triggers the client to generate and send a corresponding instruction by clicking a comparison result query button or a comparison task state query button on the client.
S130, according to the difference comparison result, highlighting a difference text in the comparison file and/or the reference file.
The highlighted difference text in the comparison file is a text of the comparison file with a difference relative to the reference file, and the highlighted difference text in the reference file is a text of the reference file with a difference relative to the comparison file. The manner of highlighting includes, but is not limited to, combinations of one or more of the following: bold font, change font color, increase font ground color, highlight font, increase font, change to italics, increase underlining, increase strikethrough, etc. When the difference comparison result includes a plurality of difference types, the different difference types may be highlighted in the same manner or in different manners.
The specific implementation manner of the step can be as follows: converting the coordinate information into DIV element position information; when the DIV element position information enters a display area of the file, highlighting a difference text at the DIV element position information in the page indicated by the page identification according to a difference type corresponding to the DIV element position information and the page identification corresponding to the DIV element position information. Where DIV is a localization technique in cascading style sheets, DIV elements are elements used to provide structure and context for block-level (block-level) content within an HTML document. For the reference file and the comparison file, a display area is respectively encapsulated, and one display area is a component, for example, two display areas arranged from left to right can be encapsulated on an interface to respectively display the reference file and the comparison file, and when the reference file and/or the comparison file have more texts and cannot be fully displayed at one time, the rolling display function of the rolling bar can be increased.
When the difference text is highlighted in the comparison file and/or the reference file according to the difference comparison result, if the difference type contained in the current difference information is content deletion, only the deleted text may be highlighted in the reference file, or the text before content deletion and the text remaining after content deletion may be highlighted respectively, that is, the difference text contained in the difference information in the reference file and the difference file in the comparison file are highlighted. If the difference type contained in the current difference information is content increase, the increased content may be highlighted only in the comparison file, or the text before the content increase and the text after the content increase may be highlighted respectively, that is, the difference text contained in the difference information in the reference file and the difference file in the comparison file are both highlighted. If the difference type contained in the current difference information is content modification, the text before content modification and the text after content modification can be highlighted, namely, the difference text contained in the difference information in the reference file and the difference file in the comparison file are highlighted.
By way of example, FIG. 2 is a partial text content of a reference file and a comparison file, in which differential text may be highlighted directly, a user may browse by dragging a scroll bar of the reference file and the comparison file. Wherein, the text with bold and underline refers to the text which is modified, the text with slant and enlarged refers to the text which is added in the comparison file, and the text with deletion line refers to the text which is deleted in the comparison file.
In one embodiment, when the user needs to check and view a difference, the user needs to drag the scroll bars of the two files to view the two files, and the operation is complicated. In order to improve the efficiency of checking differences for users, the embodiment of the invention can generate an ID (Identity Document, identity card identification number) according to DIV element position information of a difference text in the reference file, DIV element position information of a difference text in the comparison file and a difference type aiming at the same piece of difference information, respectively bind the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file, and synchronously highlight the difference text at all DIV element position information bound with the ID corresponding to the first synchronous positioning instruction when a first synchronous positioning instruction triggered based on the reference file or the comparison file is received.
Specific implementations of generating an ID include, but are not limited to: and splicing the DIV element position information of the difference text in the reference file, the DIV element position information of the difference text in the comparison file and the difference type according to a preset sequence to obtain a character string. Wherein different disparity types may be represented using different characters, e.g. "content delete", "content add" and "content modify" may be represented using "1", "2" and "3" in sequence. The first synchronization positioning instruction is an instruction generated when the user clicks on the reference file or the display area of the alignment file. When the client receives the first synchronous positioning instruction, the corresponding ID is activated. For the components corresponding to the reference file or the comparison file, whether the activated ID is the same as the ID contained in the reference file or not is judged, if the activated ID contains the same ID, the difference text at the DIV element position information corresponding to the ID can be highlighted, and when the DIV element position information is not in the display area, the DIV element position information is scrolled to the display area for display. For example, for the same piece of difference information, when the difference text "electronic device" in the reference file is located on page 2 of the reference file and the difference text "computing terminal" in the comparison file is located on page 3 of the comparison file, the user clicks at the text "electronic device" on page 2 of the reference file, the client automatically synchronizes so that the comparison file automatically scrolls to page 3 and highlights the text "computing terminal".
In one embodiment, in order to provide a way for a user to view differences more, the embodiment of the invention can display difference details in a preset display area according to the difference comparison result, wherein the preset display area is an area except a reference file display area and a comparison file display area, and the difference details comprise difference types in each piece of difference information, difference texts in the reference file and difference texts in the comparison file. In addition, the difference comparison results can be summarized, and the summarized results are displayed in display areas except for the reference file display area, the comparison file display area and the preset display area. The summarized result comprises the total number of the difference information and the paging identification where the difference information is located. For example, the summary result is "found after alignment that there are differences on pages 1, 3, 5, 8 of the reference file, and there are 20 differences between the two files.
For example, as shown in fig. 3, when the client displays the difference, the client may not only highlight the difference text in the reference file and/or the comparison file, but also display the comparison result on the right side. The upper half of the comparison result is the overall comparison result (namely the summarized result), and the lower half is the detailed comparison result (namely the difference detail). The user can browse the difference details by dragging the scroll bar of the detailed comparison result display area.
Because the preset display area is independent of the reference file display area and the comparison file display area, when a user views the preset display area, the contents displayed by the reference file display area and the comparison file display area are not changed. In this case, if the user wants to view specific contents in the reference file and the comparison file in combination with the difference details, the user is required to drag the scroll bars of the reference file display area and the comparison file display area, respectively, so that the operation is complicated. In order to improve the efficiency of a user to view differences based on difference details, the embodiment of the invention can bind the ID with the corresponding difference information in the difference details; when a second synchronous positioning instruction triggered based on the difference detail is received, acquiring an ID (identity) bound by difference information in the difference detail corresponding to the second synchronous positioning instruction; and synchronously highlighting the difference text at the position information of all DIV elements bound with the acquired ID.
The second synchronous positioning instruction is an instruction generated when a user clicks a preset display area. When the client receives the second synchronous positioning instruction, the ID of the difference information binding in the difference detail corresponding to the second synchronous positioning instruction is activated. For the components corresponding to the reference file or the comparison file, whether the activated ID is the same as the ID contained in the reference file or not is judged, if the activated ID contains the same ID, the difference text at the DIV element position information corresponding to the ID can be highlighted, and when the DIV element position information is not in the display area, the DIV element position information is scrolled to the display area for display.
In one embodiment, before or after comparing two files, when a user views the two files, the user needs to drag the scroll bars of the two file display areas respectively to realize synchronous viewing of the two files, which is complex in operation. In order to improve the efficiency of a user to review two files, the embodiment of the invention can receive a scrolling instruction aiming at a first scroll bar; determining the proportion of the current rolled length of the first rolling bar to the total length of the rolling area according to the rolling instruction; and scrolling a second scroll bar according to the proportion, so that the first scroll bar and the second scroll bar synchronously scroll. That is, for the first scroll bar, only the drag of the user is followed, and the synchronous scroll is not performed, and for the second scroll bar, the scroll is performed along with the scroll of the first scroll bar.
The first scroll bar includes a scroll bar of a reference file display area or a scroll bar of an alignment file display area, and the second scroll bar includes a scroll bar of a reference file display area or a scroll bar of an alignment file display area, but is different from the first scroll bar. That is, when the first scroll bar is a scroll bar of the reference file display area, the second scroll bar is a scroll bar of the alignment file display area; when the first scroll bar is a scroll bar that is aligned with the file display area, the second scroll bar is a scroll bar that is referenced to the file display area.
For example, when the user scrolls the scrollbar of the reference file display area, the client calculates in real time the ratio of the currently scrolled length of the scrollbar of the reference file display area to the total length of the scrolled area (for example, the currently scrolled length is 2cm, the total length of the scrolled area is 10cm, and the ratio is 0.2), and scrolls the scrollbar of the comparison file display area to the ratio of 0.2 according to the ratio (for example, the currently scrolled length is 3cm, and the total length of the scrolled area is 12cm, and scrolls to the position of 2.4 cm).
Before the difference comparison is carried out, if the user triggers synchronous scrolling, the text after synchronous scrolling can be directly displayed. After the difference comparison is performed, if the user triggers synchronous scrolling, the difference text currently scrolled to the display area can be highlighted in the comparison file and/or the reference file according to the difference comparison result, and for other texts currently scrolled to the display area, conventional display is performed without highlighting.
According to the file comparison method based on the RPA and the AI, the reference file and the comparison file to be compared can be automatically uploaded to the client by the RPA robot, the reference file and the comparison file are transmitted to the server by the client for differential comparison, and finally the differential text can be highlighted in the comparison file and/or the reference file according to the differential comparison result returned by the server. Compared with the prior art requiring manual comparison of files, the file comparison method and device can automatically trigger the client to send two files to be compared to the server by using the RPA robot to perform automatic comparison, so that manpower can be saved, personnel who originally need to do file comparison can do more valuable work in time, and the file comparison efficiency can be improved; compared with the prior art requiring manual marking of the difference, the embodiment of the invention can directly highlight the difference text in the reference file and/or the comparison file, thereby improving the readability of the difference text and further improving the efficiency of searching the difference between the two files by a user. When the client sends the reference file and the comparison file to the server, the reference file and the comparison file can be identified by utilizing OCR (Optical Character Recognition, optical character identification), then the files containing the multi-page text in the two files are subjected to text splicing to obtain a single-page continuous reference text and a single-page continuous comparison text, and finally the reference text and the comparison text are sent to the server for differential comparison, so that the server can directly connect the contexts to compare the two texts without other processing of the server, and further the efficiency and the accuracy of the server for comparing the files can be improved.
Based on the above method embodiment, another embodiment of the present invention further provides an RPA and AI-based file comparison apparatus, where the apparatus is applied to a client, as shown in fig. 4, and the apparatus includes:
the receiving unit 20 is configured to receive a reference file and a comparison file uploaded by the robot flow automation RPA robot;
a transmitting unit 22 for transmitting the reference file and the comparison file to a server;
The receiving unit 20 is further configured to receive a difference comparison result of the comparison file sent by the server with respect to the reference file;
And a display unit 24, configured to highlight, according to the difference comparison result, a difference text in the comparison file and/or the reference file, where the difference text highlighted in the comparison file is a text in which the comparison file is different from the reference file, and the difference text highlighted in the reference file is a text in which the reference file is different from the comparison file.
Optionally, the difference comparison result includes at least one piece of difference information, each piece of difference information includes a difference type, a difference text in the reference file, a difference text in the comparison file, difference position information of the difference text in the reference file, and difference position information of the difference text in the comparison file, where the difference position information includes a page identifier of a page to which the difference text belongs and coordinate information of the page to which the difference text belongs.
Optionally, the display unit 24 includes:
the conversion module is used for converting the coordinate information into the dividing DIV element position information;
And the display module is used for highlighting the difference text at the DIV element position information in the paging indicated by the paging mark according to the difference type corresponding to the DIV element position information and the paging mark corresponding to the DIV element position information when the DIV element position information enters the display area of the affiliated file.
Optionally, the display unit 24 further includes:
The generation module is used for generating an identity card identification number ID according to DIV element position information of a difference text in the reference file, DIV element position information of the difference text in the comparison file and a difference type aiming at the same piece of difference information;
the binding module is used for respectively binding the ID with DIV element position information of the difference text in the reference file and DIV element position information of the difference text in the comparison file;
And the first synchronization module is used for highlighting the difference text synchronization at all DIV element position information bound with the ID corresponding to the first synchronization instruction when the first synchronization positioning instruction triggered based on the reference file or the comparison file is received.
Optionally, the display unit 24 is further configured to display, after receiving a differential comparison result of the comparison file sent by the server with respect to the reference file, a differential detail in a preset display area according to the differential comparison result, where the preset display area is an area other than the reference file display area and the comparison file display area, and the differential detail includes a differential type in each piece of differential information, a differential text in the reference file, and a differential text in the comparison file.
Optionally, the binding module is further configured to bind the ID with corresponding difference information in the difference detail after binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file respectively;
the display unit 24 further includes:
The acquisition module is used for acquiring an ID bound by the difference information in the difference detail corresponding to the second synchronous positioning instruction when receiving the second synchronous positioning instruction triggered based on the difference detail;
And the second synchronization module is used for synchronizing and highlighting the difference text at the position information of all DIV elements bound with the acquired ID.
Optionally, the receiving unit 20 is further configured to receive a scroll instruction for a first scroll bar, where the first scroll bar includes a scroll bar of a reference file display area or a scroll bar of a reference file display area, before highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result;
The determining unit is used for determining the proportion of the current rolled length of the first rolling bar to the total length of the rolling area according to the rolling instruction;
And the synchronous scrolling unit is used for scrolling a second scroll bar according to the proportion so that the first scroll bar and the second scroll bar synchronously scroll, and the second scroll bar comprises a scroll bar of a reference file display area or a scroll bar of a comparison file display area, but is different from the first scroll bar.
Optionally, the display unit is configured to highlight, according to the difference comparison result, a difference text currently scrolled to a display area in the comparison file and/or the reference file.
Optionally, the transmitting unit 22 includes:
The recognition module is used for recognizing the reference file and the comparison file by utilizing optical character recognition OCR to obtain at least one page text of the reference file and at least one page text of the comparison file;
The splicing module is used for splicing the multi-page text of the target file into a context continuous one-page text when the target file is a file containing multi-page text, obtaining the target text, and obtaining a single-page text from the target file as the target text when the target file is a file containing the single-page text, wherein the target text is a reference text when the target file is the reference file, and the target text is a comparison text when the target file is the comparison file;
And the sending module is used for sending the reference text and the comparison text to the server.
Based on the above embodiments, another embodiment of the present invention further provides a computing device, including:
one or more processors;
storage means for storing one or more programs,
When executed by the one or more processors, cause the one or more processors to implement a method as described in any of the embodiments of the present invention. Wherein the processor is coupled to the storage device.
Based on the above method embodiments, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method according to any of the embodiments of the present invention.
Based on the above embodiment, the embodiment of the present invention also provides a file comparison system based on RPA and AI, which includes an RPA robot 30, a client 32 and a server 34. As shown in fig. 5, the RPA robot 30 may be independent of the client 32, and as shown in fig. 6, the RPA robot 30 may be part of the client 32.
The RPA robot 30 is configured to log in the client 32, upload a reference file and a comparison file to the client 32, and trigger the client 32 to send the reference file and the comparison file to the server 34 for differential comparison;
The client 32 is configured to receive a reference file and a comparison file uploaded by the RPA robot, and send the reference file and the comparison file to a server;
The server 34 is configured to perform differential comparison on the reference file and the comparison file according to a preset comparison algorithm, so as to obtain a differential comparison result of the comparison file relative to the reference file;
The client 32 is further configured to receive a differential comparison result sent by the server, and according to the differential comparison result, highlight a differential text in the comparison file and/or the reference file, where the differential text highlighted in the comparison file is a text in which the comparison file has a difference with respect to the reference file, and the differential text highlighted in the reference file is a text in which the reference file has a difference with respect to the comparison file.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In various embodiments of the present invention, it should be understood that the sequence numbers of the foregoing processes do not imply that the execution sequences of the processes should be determined by the functions and internal logic of the processes, and should not be construed as limiting the implementation of the embodiments of the present invention.
In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B may be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present invention, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, comprising several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in a computer device) to execute some or all of the steps of the above-mentioned method of the various embodiments of the present invention.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the above embodiments may be implemented by a program that instructs associated hardware, the program may be stored in a computer readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disk Memory, magnetic disk Memory, tape Memory, or any other medium that can be used for carrying or storing data.
Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.
Those of ordinary skill in the art will appreciate that: the modules in the apparatus of the embodiments may be distributed in the apparatus of the embodiments according to the description of the embodiments, or may be located in one or more apparatuses different from the present embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (12)

1. A file comparison method based on RPA and AI, the method being applied to a client, the method comprising:
S1, receiving a reference file and a comparison file uploaded by an RPA robot of robot process automation;
S2, the reference file and the comparison file are sent to a server;
S3, receiving a difference comparison result of the comparison file sent by the server relative to the reference file;
S4, highlighting a difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is a text with a difference between the comparison file and the reference file, and the difference text highlighted in the reference file is a text with a difference between the reference file and the comparison file;
The difference comparison result comprises at least one piece of difference information, wherein each piece of difference information comprises a difference type, a difference text in the reference file, a difference text in the comparison file, difference position information of the difference text in the reference file and difference position information of the difference text in the comparison file, and the difference position information comprises a page mark of a page to which the difference text belongs and coordinate information of the page to which the difference text belongs;
The step S4 comprises the following steps:
s41, converting the coordinate information into dividing DIV element position information;
S43, aiming at the same piece of difference information, generating an identity card identification number ID according to DIV element position information of a difference text in the reference file, DIV element position information of a difference text in the comparison file and a difference type, and respectively binding the ID with the DIV element position information of the difference text in the reference file and the DIV element position information of the difference text in the comparison file;
S44, when a first synchronous positioning instruction triggered based on the reference file or the comparison file is received, highlighting the difference text synchronization at all DIV element position information bound with the ID corresponding to the first synchronous positioning instruction.
2. The method of claim 1, wherein S4 further comprises:
S42, when the DIV element position information enters a display area of the affiliated file, highlighting a difference text at the DIV element position information in the paging indicated by the paging identification according to a difference type corresponding to the DIV element position information and the paging identification corresponding to the DIV element position information.
3. The method according to claim 2, characterized in that after said S3, the method further comprises:
S5, displaying a difference detail in a preset display area according to the difference comparison result, wherein the preset display area is an area except a reference file display area and a comparison file display area, and the difference detail comprises a difference type in each piece of difference information, a difference text in the reference file and a difference text in the comparison file.
4. A method according to claim 3, characterized in that after said S43, the method further comprises:
s45, binding the ID with corresponding difference information in the difference detail;
S46, when a second synchronous positioning instruction triggered based on the difference detail is received, acquiring an ID (identity) bound by difference information in the difference detail corresponding to the second synchronous positioning instruction;
S47, highlighting the difference text synchronization at the position information of all DIV elements bound with the acquired ID.
5. The method of claim 1, wherein prior to S4, the method further comprises:
s6, receiving a scrolling instruction aiming at a first scroll bar, wherein the first scroll bar comprises a scroll bar of a reference file display area or a scroll bar of a comparison file display area;
S7, determining the proportion of the current rolled length of the first rolling bar to the total length of the rolling area according to the rolling instruction;
S8, rolling a second rolling bar according to the proportion so that the first rolling bar and the second rolling bar synchronously roll, wherein the second rolling bar comprises a rolling bar of a reference file display area or a rolling bar of a comparison file display area, but is different from the first rolling bar.
6. The method of claim 5, wherein S4 comprises:
and according to the difference comparison result, highlighting the difference text currently scrolled to the display area in the comparison file and/or the reference file.
7. The method according to any one of claims 1-6, wherein S2 comprises:
s21, recognizing the reference file and the comparison file by utilizing optical character recognition OCR to obtain at least one page text of the reference file and at least one page text of the comparison file;
S22, when a target file is a file containing multiple pages of texts, splicing the multiple pages of texts of the target file into a continuous context page text to obtain a target text, and when the target file is a file containing a single page of text, acquiring the single page of text from the target file as the target text, wherein when the target file is the reference file, the target text is the reference text, and when the target file is the comparison file, the target text is the comparison text;
S23, the reference text and the comparison text are sent to the server.
8. A file comparison apparatus based on RPA and AI, the apparatus being applied to a client, the apparatus comprising:
The receiving unit is used for receiving the reference file and the comparison file uploaded by the robot flow automatic RPA robot;
the sending unit is used for sending the reference file and the comparison file to a server;
the receiving unit is further configured to receive a difference comparison result of the comparison file sent by the server with respect to the reference file;
The display unit is used for highlighting the difference text in the comparison file and/or the reference file according to the difference comparison result, wherein the difference text highlighted in the comparison file is the text with the difference relative to the reference file, and the difference text highlighted in the reference file is the text with the difference relative to the reference file;
The difference comparison result comprises at least one piece of difference information, wherein each piece of difference information comprises a difference type, a difference text in the reference file, a difference text in the comparison file, difference position information of the difference text in the reference file and difference position information of the difference text in the comparison file, and the difference position information comprises a page mark of a page to which the difference text belongs and coordinate information of the page to which the difference text belongs;
the display unit includes:
the conversion module is used for converting the coordinate information into the dividing DIV element position information;
The generation module is used for generating an identity card identification number ID according to DIV element position information of a difference text in the reference file, DIV element position information of the difference text in the comparison file and a difference type aiming at the same piece of difference information;
the binding module is used for respectively binding the ID with DIV element position information of the difference text in the reference file and DIV element position information of the difference text in the comparison file;
And the first synchronization module is used for highlighting the difference text synchronization at all DIV element position information bound with the ID corresponding to the first synchronization instruction when the first synchronization positioning instruction triggered based on the reference file or the comparison file is received.
9. The apparatus of claim 8, wherein the display unit further comprises:
And the display module is used for highlighting the difference text at the DIV element position information in the paging indicated by the paging mark according to the difference type corresponding to the DIV element position information and the paging mark corresponding to the DIV element position information when the DIV element position information enters the display area of the affiliated file.
10. The apparatus according to any of claims 8-9, wherein the transmitting unit comprises:
The recognition module is used for recognizing the reference file and the comparison file by utilizing optical character recognition OCR to obtain at least one page text of the reference file and at least one page text of the comparison file;
The splicing module is used for splicing the multi-page text of the target file into a context continuous one-page text when the target file is a file containing multi-page text, obtaining the target text, and obtaining a single-page text from the target file as the target text when the target file is a file containing the single-page text, wherein the target text is a reference text when the target file is the reference file, and the target text is a comparison text when the target file is the comparison file;
And the sending module is used for sending the reference text and the comparison text to the server.
11. A computing device, the computing device comprising:
one or more processors;
storage means for storing one or more programs,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
12. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
CN202111138084.1A 2021-09-27 2021-09-27 File comparison method, device, equipment and storage medium based on RPA and AI Active CN113836092B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111138084.1A CN113836092B (en) 2021-09-27 2021-09-27 File comparison method, device, equipment and storage medium based on RPA and AI
PCT/CN2021/131627 WO2023045053A1 (en) 2021-09-27 2021-11-19 File comparison method and apparatus based on rpa and ai, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111138084.1A CN113836092B (en) 2021-09-27 2021-09-27 File comparison method, device, equipment and storage medium based on RPA and AI

Publications (2)

Publication Number Publication Date
CN113836092A CN113836092A (en) 2021-12-24
CN113836092B true CN113836092B (en) 2024-06-21

Family

ID=78970974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111138084.1A Active CN113836092B (en) 2021-09-27 2021-09-27 File comparison method, device, equipment and storage medium based on RPA and AI

Country Status (2)

Country Link
CN (1) CN113836092B (en)
WO (1) WO2023045053A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115450B (en) * 2022-08-30 2022-11-29 平安银行股份有限公司 Method and device for establishing case for dispute of Unionpay
CN118194843A (en) * 2024-05-17 2024-06-14 山东浪潮科学研究院有限公司 Method and system for remarkably displaying document text content difference

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528587A (en) * 2016-09-12 2017-03-22 腾讯科技(深圳)有限公司 Page display method and apparatus in composite webpage system
CN109543614A (en) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 A kind of this difference of full text comparison method and equipment
CN113407665A (en) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 Text comparison method, device, medium and electronic equipment

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101470572B (en) * 2007-12-29 2010-09-01 英业达股份有限公司 Optimum display system and method for context progress
IL235565B (en) * 2014-11-06 2019-06-30 Kolton Achiav Location based optical character recognition (ocr)
CN110162509A (en) * 2019-04-26 2019-08-23 平安普惠企业管理有限公司 File comparison method, device, computer equipment and storage medium
CN111914597B (en) * 2019-05-09 2024-03-15 杭州睿琪软件有限公司 Document comparison identification method and device, electronic equipment and readable storage medium
US11249729B2 (en) * 2019-10-14 2022-02-15 UiPath Inc. Providing image and text data for automatic target selection in robotic process automation
CN111460763A (en) * 2020-03-02 2020-07-28 南京南瑞继保电气有限公司 Method, device and equipment for marking file differences and computer-readable storage medium
CN111753517A (en) * 2020-06-30 2020-10-09 北京来也网络科技有限公司 Document comparison method, device, equipment and medium based on RPA and AI
CN112084748A (en) * 2020-09-19 2020-12-15 神思电子技术股份有限公司 Text comparison method
CN112882947B (en) * 2021-03-15 2024-06-11 深圳市腾讯信息技术有限公司 Interface testing method, device, equipment and storage medium
CN113031887A (en) * 2021-04-08 2021-06-25 成都微视联软件技术有限公司 Method for supporting various headers and subsection printing in html file printing
CN113836096A (en) * 2021-09-27 2021-12-24 北京来也网络科技有限公司 File comparison method, device, equipment, medium and system based on RPA and AI

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528587A (en) * 2016-09-12 2017-03-22 腾讯科技(深圳)有限公司 Page display method and apparatus in composite webpage system
CN109543614A (en) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 A kind of this difference of full text comparison method and equipment
CN113407665A (en) * 2021-05-25 2021-09-17 北京有竹居网络技术有限公司 Text comparison method, device, medium and electronic equipment

Also Published As

Publication number Publication date
CN113836092A (en) 2021-12-24
WO2023045053A1 (en) 2023-03-30

Similar Documents

Publication Publication Date Title
US7715625B2 (en) Image processing device, image processing method, and storage medium storing program therefor
US20210149842A1 (en) System and method for display of document comparisons on a remote device
EP2891992A1 (en) Systems and methods for visual definition of data associations
CN113836092B (en) File comparison method, device, equipment and storage medium based on RPA and AI
US9710440B2 (en) Presenting fixed format documents in reflowed format
US20060156221A1 (en) Embedded ad hoc browser web to spreadsheet conversion control
US10198406B2 (en) Modifying native document comments in a preview
US10178248B2 (en) Computing device for generating a document by combining content data with form data
WO2023155712A1 (en) Page generation method and apparatus, page display method and apparatus, and electronic device and storage medium
US20120143842A1 (en) Image element searching
US10643022B2 (en) PDF extraction with text-based key
JP6840597B2 (en) Search result summarizing device, program and method
CN111797297B (en) Page data processing method and device, computer equipment and storage medium
JP5766438B2 (en) Method and system for click-through function in electronic media
US20120072492A1 (en) Browsing information gathering system, browsing information gathering method, server, and recording medium
CN103034990A (en) Method and device for checking publications
WO2023045056A1 (en) Document comparison method, apparatus and system based on rpa and ai, and device and medium
CN112364270B (en) Webpage element storage method, electronic equipment and storage medium
CN110515618B (en) Page information input optimization method, equipment, storage medium and device
JP2009110506A (en) Information processing apparatus and information processing program
JP5236449B2 (en) Form display system, information processing apparatus, form display method, information processing method, program
CN118194883A (en) Literature plate type reduction method and device based on machine translation
JP6311301B2 (en) Information providing program, information providing method, and information providing apparatus
CN116720021A (en) Page configuration method, device, electronic equipment and medium
CA2571092C (en) Document output processing using content data and form data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant