CN113568897A - Method, device, terminal and medium for deduplication processing of external files in various formats - Google Patents

Method, device, terminal and medium for deduplication processing of external files in various formats Download PDF

Info

Publication number
CN113568897A
CN113568897A CN202110758043.6A CN202110758043A CN113568897A CN 113568897 A CN113568897 A CN 113568897A CN 202110758043 A CN202110758043 A CN 202110758043A CN 113568897 A CN113568897 A CN 113568897A
Authority
CN
China
Prior art keywords
file
deduplication
instruction
repeated
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110758043.6A
Other languages
Chinese (zh)
Inventor
关瑞
姜坤
卫宣安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Zhenyou Communication Technology Co ltd
Original Assignee
Xi'an Zhenyou Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Zhenyou Communication Technology Co ltd filed Critical Xi'an Zhenyou Communication Technology Co ltd
Priority to CN202110758043.6A priority Critical patent/CN113568897A/en
Publication of CN113568897A publication Critical patent/CN113568897A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • G06F16/287Visualization; Browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, a terminal and a medium for deduplication processing of external files in various formats, wherein the method comprises the following steps: acquiring a selection instruction, and selecting a file needing duplicate removal according to the selection instruction; polling files needing to be deduplicated to find out repeated entry information in the files; judging whether an automatic duplicate removal instruction is received or not; when an automatic duplicate removal instruction is detected, automatically keeping the first entry information for the found duplicate entry information in the file, and deleting the remaining duplicate entry information; and polling all the rows again, and checking and modifying the partially repeated entry contents according to the requirement by the acquisition instruction to complete source file deduplication. The invention can automatically screen out the repeated data and automatically remove the repeated data, thereby providing convenience for the use of users.

Description

Method, device, terminal and medium for deduplication processing of external files in various formats
Technical Field
The invention relates to the technical field of data deduplication, in particular to a deduplication processing method and device suitable for external files of various formats, an intelligent terminal and a storage medium.
Background
In a complicated Excel data table, sometimes, a large number of data tables need to be manually operated by removing repeated values, which is troublesome. In the prior art, the traditional method adopts a searching mode to perform full text searching. In the face of mass data, the searched keywords are manually input, so that the work is time-consuming and labor-consuming, and resources are greatly consumed.
Thus, there is still a need for improvement and development of the prior art.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method, an apparatus, an intelligent terminal and a storage medium for deduplication processing of external files with multiple formats, which are suitable for the above defects in the prior art.
The technical scheme adopted by the invention for solving the problems is as follows:
a method for processing deduplication of external files in multiple formats, wherein the method comprises the following steps:
acquiring a selection instruction, and selecting a file needing duplicate removal according to the selection instruction;
polling files needing to be deduplicated to find out repeated entry information in the files;
judging whether an automatic duplicate removal instruction is received or not;
when an automatic duplicate removal instruction is detected, automatically keeping the first entry information for the found duplicate entry information in the file, and deleting the remaining duplicate entry information;
and polling all the rows again, and checking and modifying the partially repeated entry contents according to the requirement by the acquisition instruction to complete source file deduplication.
The method for processing the external files in multiple formats in a duplicate removal manner is characterized in that a selection instruction is obtained, and the files needing duplicate removal are selected according to the selection instruction; the steps of (a) are preceded by:
presetting a de-reconfiguration file, wherein the configuration file comprises: whether the repeated content is deleted or not, whether repeated rows are highlighted or not, the highlighted color is set, the size of the file is read every time, and whether a file after duplication removal and a new file address are generated or not.
The method for processing the external files in multiple formats in a duplicate removal manner is characterized in that a selection instruction is obtained, and the files needing duplicate removal are selected according to the selection instruction; the steps of (a) are preceded by:
a deduplication button for deduplication of various files is preset.
The method for processing the deduplication of the external files adapting to the multiple formats, wherein the step of polling the files needing deduplication and finding out the repeated entry information in the files comprises the following steps:
automatically polling files needing duplicate removal to obtain each piece of data in the files, and performing polling comparison;
and once the same data content is found, capturing the whole content of the line, highlighting the content of the line, and finding out repeated item information in the file.
The method for processing the external file with multiple formats in the deduplication mode, wherein when an automatic deduplication instruction is detected, the first entry information is automatically reserved for the found duplicate entry information in the file, and the step of deleting the remaining duplicate entry information further comprises the following steps of:
control updates the deduplication process progress through a real-time percentage progress bar display and displays the time expected to be completed. The progress and time of the whole process can be witnessed.
The method for processing the external file deduplication adaptable to the multiple formats is characterized in that the step of polling all rows again, checking and modifying the partially repeated entry contents according to needs by the acquisition instruction, and completing the source file deduplication comprises the following steps:
polling data in a file needing to be deduplicated to form a list containing initial content and repeated content, wherein the one-to-many relationship is formed;
and receiving an operation instruction to modify the repeated data one by one or modify the repeated data in batch.
The method for processing the external file deduplication adaptable to the multiple formats is characterized in that the step of polling all rows again, checking and modifying the partially repeated entry contents according to needs by the acquisition instruction, and completing the source file deduplication comprises the following steps:
storing the modified data, i.e. mapping all modifications to the source file; a fully deduplicated file is generated.
An apparatus for processing external file deduplication that accommodates multiple formats, wherein the apparatus comprises:
the duplicate removal file selection module is used for acquiring a selection instruction and selecting a file needing duplicate removal according to the selection instruction;
the polling detection module is used for polling the files needing to be deduplicated and finding out repeated entry information in the files;
the judging module is used for judging whether an automatic duplicate removal instruction is received or not;
the duplicate removal processing module is used for automatically keeping the first item information for the found duplicate item information in the file and deleting the residual duplicate item information when an automatic duplicate removal instruction is detected;
and the repeated checking module is used for polling all the rows again, checking and modifying the partially repeated entry contents according to the requirement by the acquired instruction, and finishing source file duplication elimination.
An intelligent terminal comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to implement the steps of any of the methods when the one or more programs are executed by one or more processors.
A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of an electronic device, enable the electronic device to perform any of the methods described herein. .
The invention has the beneficial effects that: the embodiment of the invention provides a short message automatic forwarding method applied to different interfaces, which comprises the following steps: acquiring a selection instruction, and selecting a file needing duplicate removal according to the selection instruction; polling files needing to be deduplicated to find out repeated entry information in the files; judging whether an automatic duplicate removal instruction is received or not; when an automatic duplicate removal instruction is detected, automatically keeping the first entry information for the found duplicate entry information in the file, and deleting the remaining duplicate entry information; and polling all the rows again, and checking and modifying the partially repeated entry contents according to the requirement by the acquisition instruction to complete source file deduplication. The invention can automatically screen out the repeated data and automatically remove the repeated data, thereby providing convenience for the use of users.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a deduplication processing method for external files with multiple formats according to an embodiment of the present invention.
Fig. 2 is a schematic flowchart of a deduplication processing method adapted to external files of multiple formats according to a second embodiment of the present invention.
Fig. 3 is a schematic flowchart of a deduplication processing method adapted to external files of multiple formats according to a third embodiment of the present invention.
Fig. 4 is a schematic block diagram of a device for processing external file deduplication in multiple formats according to an embodiment of the present invention.
Fig. 5 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
The research shows that in a complicated Excel data table, repeated values are sometimes required to be removed from one table with a large amount of data, and the operation is troublesome by a manual method. In the prior art, the traditional method adopts a searching mode to perform full text searching. In the face of mass data, the searched keywords are manually input, so that the work is time-consuming and labor-consuming, and resources are greatly consumed.
The existing comparison tools which are distributed on the internet are all comparison. The same color block was not distinguishable by the naked eye. And manual work to screen and extract the duplicate is undoubtedly a difficult and tedious task.
The invention provides a method for automatically removing duplicate and quickly screening duplicate content.
Exemplary method
As shown in fig. 1, in an embodiment of the present invention, a method for deduplication processing of external files in multiple formats is provided, where the method includes the following steps:
s100, acquiring a selection instruction, and selecting a file needing duplicate removal according to the selection instruction;
the embodiment of the invention is mainly used for removing the duplicate entries in the file. Before the implementation, a reconfiguration file is preset, wherein the configuration file comprises: whether duplicate content is deleted (including deletion or non-deletion, for example), whether duplicate rows are highlighted (highlight duplicate rows, for example), the color setting of highlighting (highlight color may be set to yellow, for example), the size of a file read each time (for example, a user may set the size of each read to 20M), whether a file after deduplication is generated, a new file address, and the like.
Specifically, before the present invention is embodied, the following steps may be performed: a deduplication button for deduplication of various files is preset. The user can select whether to deduplicate the selected file by operating the deduplication button.
S200, polling files needing duplicate removal to find out repeated entry information in the files;
in the embodiment of the invention, the file needing to be deduplicated is polled, namely, the entry data in the file needing to be deduplicated is searched in turn to find out the repeated entry information in the file.
Specifically, for example, a file needing deduplication is automatically searched in turn, each piece of data in the file is obtained, and polling comparison is performed; and once the same data content is found, capturing the whole content of the line, highlighting the content of the line, and finding out the corresponding repeated content in other items so as to find out the repeated item information in the file. For example, in an excel file, one third appears 11 times. Then the data for these 11 rows are all captured and highlighted with red as the background color.
Step S300, judging whether an automatic duplicate removal instruction is received or not;
in the embodiment of the present invention, whether an automatic deduplication instruction operated by a user is received or not is determined, and if yes, the process proceeds to step S300.
S400, when an automatic duplicate removal instruction is detected, automatically keeping the first item information for the found duplicate item information in the file, and deleting the remaining duplicate item information;
for example, when an automatic deduplication instruction is detected, the first entry information is automatically reserved for the found duplicate entry information in the file, and the remaining duplicate entry information is deleted. In the step, the first item information is automatically reserved for the found repeated item information in the file, the rest repeated item information is deleted, and if the found repeated item is completely repeated in the whole row of contents, the automatic deduplication is started, the first item information is automatically reserved, and the rest repeated item information is deleted.
In addition, in the embodiment of the invention, the deduplication processing progress is also controlled to be updated through a real-time percentage progress bar display, and the predicted time required for completion is displayed. The progress and time of the whole process can be witnessed. For example, a 1G file, which is expected to take 15 minutes to complete, starts at 0% and slowly extends to 33%, 34%, 35% … … 100%. This has the advantage of letting the user know the time required for file processing from the beginning and be able to see the progress of the processing in real time. The whole process is controllable and effective.
And step S500, polling all the rows again, checking and modifying the partially repeated entry contents according to the requirement by the acquisition instruction, and finishing source file duplication elimination.
In the embodiment of the invention, in specific implementation, after polling is carried out on data in a file needing duplicate removal, a list is formed, and the list comprises initial content and repeated content and one-to-many relationship; and receiving an operation instruction to modify the repeated data one by one or modify the repeated data in batch. That is, the same entry in the deduplicated content is automatically deduplicated, and the partially deduplicated content is deduplicated as needed. For example, Zhang three, followed by Zhang three of 11 rows and its basic information. According to the actual situation, 9 sheets three can be deleted, and 2 sheets are left: 1. zhang III, female, 18 years old, Beijing; 2. zhang three, male, 25 years old, Shanghai.
The repeated data is conveniently modified one by one or in batches: for example: lie four appears 5 times, of which lie four in item 3, only because the name is wrong, other basic information shows this artificial lie. The name can be modified manually without deletion. Selecting the fourth plum of items 2, 4 and 5, and modifying the sex in batch. Avoiding the re-process of deleting useful item contents.
Specifically, the present invention saves the modified data, i.e., maps all modifications to the source file; a fully deduplicated file is generated.
Therefore, the method and the device can automatically screen out the repeated data and automatically remove the repeated data, and provide convenience for users.
The invention is described in further detail below by means of a specific application example:
example II,
The second specific application embodiment of the invention provides a method for deduplication processing of external files adapting to multiple formats, which comprises the following steps:
step S10: presetting a reconfiguration removal file;
in this step, the presetting of the de-duplication configuration file includes: setting whether the repeated content is deleted or not, whether repeated rows are highlighted or not, setting highlighted colors, reading the size of the file each time, whether a file after duplication removal is generated or not, a new file address and the like. For example: setting a default item for deleting repeated contents, and directly deleting the repeated contents once the contents are found to be repeated; if the configuration generates a deduplicated file, the source file will not be affected, but only all modifications will be embodied in the new file. The method has the advantages that the preset configuration file can be determined according to the actual scene, and the method is simple, quick and convenient to operate.
Step S20: when a selection instruction is obtained, selecting a file (supporting multiple formats) needing duplicate removal, and when a command of clicking a duplicate removal button is obtained, controlling to enter a duplicate removal process;
for example, an excel file is selected, and a word file and a txt notepad can also be selected. The invention has the advantages that: the method can support files in various formats and meet different types of deduplication requirements.
Step S30: automatically executing according to the duplicate removal button instruction, acquiring each piece of data in the file needing duplicate removal, and performing polling comparison; once the same data content is found, capturing the whole content of the line, and highlighting the content of the line;
for example: if the file needing to be deduplicated is an excel file, polling and comparing each piece of data in the excel file, and displaying three pieces of data 11 times. Then the data for these 11 rows are all captured and highlighted with red as the background color. The advantages of this step are: the program execution speed is very fast, and a file can be processed within a basic millisecond. And the highlight display can make the same content be particularly striking.
Step S40: writing the repeated data captured from the file needing to be deduplicated into a system cache;
in this embodiment, for example: in the excel file, three sheets appeared 11 times. Then all of the 11 lines of data are fetched and placed in the cache. In the embodiment of the invention, the content of one line of the file is converted into the form of an object, and the attribute of the object is operated. And programming in an object-oriented mode, and writing in when the program is finally stored, so that the performance is improved.
Step S50: continuously polling the data in the file needing to be deduplicated until the whole text is finished;
for example, in the embodiment of the present invention, each table content is executed for each row of data in the file that needs to be deduplicated. Therefore, rigorousness and reliability can be achieved, and missing data is avoided.
Step S60: the deduplication process progress is updated by a real-time percentage progress bar display and the expected time required to complete is displayed. The progress and time of the whole process can be witnessed.
For example, processing a deduplication file example: the weight removal required was for a 1G file, and the initial progress was expected to be from 0% to 33%, 34%, 35% … … 100% over the 15 minutes required to complete. This has the advantage of letting the user know the time required for file processing from the beginning and be able to see the progress of the processing in real time. The whole process is controllable and effective.
Step S70: polling the data in the file needing to be deduplicated to form a list containing initial content and repeated content, wherein the one-to-many relationship is formed.
For example, polling the data in the file that needs to be deduplicated forms a list: zhang san followed by zhang san of 11 rows and its basic information. According to the actual situation, 9 sheets three can be deleted, and 2 sheets are left: 1. zhang III, female, 18 years old, Beijing; 2. zhang three, male, 25 years old, Shanghai. The benefits of this are: if all the configuration files are manually checked, all the repeated contents are output by the table and are displayed in the form of the table, and the manual checking can be performed.
Step S80: receiving an operation instruction to modify the repeated data one by one or modify the repeated data in batch;
for example, after polling data in a file that needs to be deduplicated, a list is formed and displayed: lie four appears 5 times, of which lie four in item 3, only because the name is wrong, other basic information shows this artificial lie. The name can be modified manually by receiving an operation instruction without deleting. Selecting the fourth plum of items 2, 4 and 5, and modifying the sex in batch. The embodiment of the invention can receive the operation instruction to select the repeated item, delete the repeated item or delete all the repeated items. The advantages of this step are: through auditing modification, the correctness and reliability of the data are effectively guaranteed. Batch operations reduce the workload of the user.
Step S90: the modified data is saved, i.e., all modifications are mapped into the source file. A fully deduplicated file is complete.
For example, if all configuration files are configured for automatic deduplication. After the program is executed, a new file is generated at the designated disk position, and the source file is deleted after the new file is deduplicated. The advantages of this step are: saving storage space. Through data de-duplication, the number of required storage media and the space can be reduced, and the cost is reduced.
Step S91: and receiving an operation instruction, and selecting a plurality of large files at one time to perform automatic deduplication processing.
For example, the embodiment of the present invention may select 20 large files at a time, and the program may perform the deduplication method sequentially according to the queue. For example, automatic execution begins at 12 pm. The advantages of this step are: the computer is executed in idle time and can be executed according to the sequence of the queue, and the daily use of the computer by a user is not delayed.
In a third embodiment, as shown in fig. 3, a method for deduplication processing of an external file adapted to multiple formats in a third embodiment of the method of the present invention includes:
step S101, start;
step S102, reading a configuration file, and entering step S103;
step S103, selecting a file, and entering step S104;
step S104, clicking a duplicate removal button, and entering step S105;
step S105, polling the file to find out repeated lines, and entering step S106;
step S106, judging whether to automatically remove the duplicate, if so, entering step S107, and if not, entering step S110;
step S107, reserving the first row, deleting the repeated row, and entering step S108;
step S108, polling all rows, deleting repeated rows, and entering step S109;
step S109, the source file is deduplicated, and step S113 is entered;
step S110, forming a list, highlighting repeated lines, and entering step S111;
step S111, manual review, and step S112;
s112, saving and modifying, producing a new file, and deleting the source file; and proceeds to step S113;
and step S113 is ended.
Therefore, the embodiment of the invention provides a method for realizing quick duplicate removal suitable for external files with various formats.
Exemplary device
As shown in fig. 4, an embodiment of the present invention provides a device for processing external file deduplication, which accommodates multiple formats, and the device includes:
a duplicate removal file selection module 410, configured to obtain a selection instruction, and select a file that needs to be deduplicated according to the selection instruction;
the polling detection module 420 is configured to poll a file that needs to be deduplicated, and find out duplicate entry information in the file;
a judging module 430, configured to judge whether an automatic deduplication instruction is received;
the duplicate removal processing module 440 is configured to, when an automatic duplicate removal instruction is detected, automatically retain the first entry information for the found duplicate entry information in the file, and delete the remaining duplicate entry information;
the duplication check module 450 is configured to poll all rows again, and obtain an instruction to review and modify the partially duplicated entry content as needed, so as to complete source file deduplication, as described above.
Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 5. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. The computer program is executed by a processor to implement a method of deduplication processing that accommodates multiple formats of external files. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the intelligent terminal is arranged inside the intelligent terminal in advance and used for detecting the operating temperature of internal equipment.
It will be understood by those skilled in the art that the block diagram shown in fig. 5 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.
In one embodiment, an intelligent terminal is provided that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
acquiring a selection instruction, and selecting a file needing duplicate removal according to the selection instruction;
polling files needing to be deduplicated to find out repeated entry information in the files;
judging whether an automatic duplicate removal instruction is received or not;
when an automatic duplicate removal instruction is detected, automatically keeping the first entry information for the found duplicate entry information in the file, and deleting the remaining duplicate entry information;
and polling all the rows again, and checking and modifying the partially repeated entry contents according to the requirement by the acquisition instruction to complete source file deduplication.
The file duplication eliminating method comprises the steps that a selection instruction is obtained, and files needing duplication elimination are selected according to the selection instruction; the steps of (a) are preceded by:
presetting a de-reconfiguration file, wherein the configuration file comprises: whether the repeated content is deleted or not, whether repeated rows are highlighted or not, the highlighted color is set, the size of the file is read every time, and whether a file after duplication removal and a new file address are generated or not.
The file duplication eliminating method comprises the steps that a selection instruction is obtained, and files needing duplication elimination are selected according to the selection instruction; the steps of (a) are preceded by:
a deduplication button for deduplication of various files is preset.
The method for processing the deduplication of the external files adapting to the multiple formats, wherein the step of polling the files needing deduplication and finding out the repeated entry information in the files comprises the following steps:
automatically polling files needing duplicate removal to obtain each piece of data in the files, and performing polling comparison;
and once the same data content is found, capturing the whole content of the line, highlighting the content of the line, and finding out repeated item information in the file.
Wherein, when detecting the automatic duplicate removal instruction, automatically retaining the first entry information for the found duplicate entry information in the file, and deleting the remaining duplicate entry information further comprises:
control updates the deduplication process progress through a real-time percentage progress bar display and displays the time expected to be completed. The progress and time of the whole process can be witnessed.
Wherein, the step of polling all rows again, acquiring the instruction to check and modify the partially repeated entry content as required, and finishing the source file deduplication comprises the following steps:
polling data in a file needing to be deduplicated to form a list containing initial content and repeated content, wherein the one-to-many relationship is formed;
and receiving an operation instruction to modify the repeated data one by one or modify the repeated data in batch.
Wherein, the step of polling all rows again, acquiring the instruction to check and modify the partially repeated entry content as required, and finishing the source file deduplication comprises the following steps:
storing the modified data, i.e. mapping all modifications to the source file; a fully deduplicated file is generated. .
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In summary, the present invention discloses a method, an apparatus, an intelligent terminal and a storage medium for deduplication processing of external files with multiple formats, wherein the method comprises: acquiring a selection instruction, and selecting a file needing duplicate removal according to the selection instruction; polling files needing to be deduplicated to find out repeated entry information in the files; judging whether an automatic duplicate removal instruction is received or not; when an automatic duplicate removal instruction is detected, automatically keeping the first entry information for the found duplicate entry information in the file, and deleting the remaining duplicate entry information; and polling all the rows again, and checking and modifying the partially repeated entry contents according to the requirement by the acquisition instruction to complete source file deduplication. The invention can automatically screen out the repeated data and automatically remove the repeated data, thereby providing convenience for the use of users.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (10)

1. A method for processing external files in multiple formats in a deduplication mode, the method comprising:
acquiring a selection instruction, and selecting a file needing duplicate removal according to the selection instruction;
polling files needing to be deduplicated to find out repeated entry information in the files;
judging whether an automatic duplicate removal instruction is received or not;
when an automatic duplicate removal instruction is detected, automatically keeping the first entry information for the found duplicate entry information in the file, and deleting the remaining duplicate entry information;
and polling all the rows again, and checking and modifying the partially repeated entry contents according to the requirement by the acquisition instruction to complete source file deduplication.
2. The method for processing external files in multiple formats according to claim 1, wherein the step of obtaining a selection instruction and selecting the files to be deduplicated according to the selection instruction comprises:
presetting a de-reconfiguration file, wherein the configuration file comprises: whether the repeated content is deleted or not, whether repeated rows are highlighted or not, the highlighted color is set, the size of the file is read every time, and whether a file after duplication removal and a new file address are generated or not.
3. The method for processing the deduplication of the external files adapting to the multiple formats according to claim 1, wherein the obtaining of the selection instruction selects the file to be deduplicated according to the selection instruction; the steps of (a) are preceded by:
a deduplication button for deduplication of various files is preset.
4. The method for processing deduplication processing of an external file adapted to multiple formats according to claim 1, wherein the step of polling the file to be deduplicated to find duplicate entry information in the file comprises:
automatically polling files needing duplicate removal to obtain each piece of data in the files, and performing polling comparison;
and once the same data content is found, capturing the whole content of the line, highlighting the content of the line, and finding out repeated item information in the file.
5. The method according to claim 1, wherein when the automatic deduplication instruction is detected, the first entry information is automatically retained for the found duplicate entry information in the file, and the step of deleting the remaining duplicate entry information further comprises:
control updates the deduplication process progress through a real-time percentage progress bar display and displays the time expected to be completed. The progress and time of the whole process can be witnessed.
6. The method for processing deduplication processing of an external file adapted to multiple formats according to claim 1, wherein the step of polling all rows again, checking and modifying the partially repeated entry content according to the requirement by the fetch instruction, and completing deduplication of the source file comprises:
polling data in a file needing to be deduplicated to form a list containing initial content and repeated content, wherein the one-to-many relationship is formed;
and receiving an operation instruction to modify the repeated data one by one or modify the repeated data in batch.
7. The method for processing deduplication processing of an external file adapted to multiple formats according to claim 1, wherein the step of polling all rows again, checking and modifying the partially repeated entry content according to the requirement by the fetch instruction, and completing deduplication of the source file comprises:
storing the modified data, i.e. mapping all modifications to the source file; a fully deduplicated file is generated.
8. An apparatus for handling deduplication processing of an external file adapted for multiple formats, the apparatus comprising:
the duplicate removal file selection module is used for acquiring a selection instruction and selecting a file needing duplicate removal according to the selection instruction;
the polling detection module is used for polling the files needing to be deduplicated and finding out repeated entry information in the files;
the judging module is used for judging whether an automatic duplicate removal instruction is received or not;
the duplicate removal processing module is used for automatically keeping the first item information for the found duplicate item information in the file and deleting the residual duplicate item information when an automatic duplicate removal instruction is detected;
and the repeated checking module is used for polling all the rows again, checking and modifying the partially repeated entry contents according to the requirement by the acquired instruction, and finishing source file duplication elimination.
9. An intelligent terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to implement the steps of the method according to any one of claims 1-7 when the one or more programs are executed by one or more processors.
10. A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1-7.
CN202110758043.6A 2021-07-05 2021-07-05 Method, device, terminal and medium for deduplication processing of external files in various formats Pending CN113568897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110758043.6A CN113568897A (en) 2021-07-05 2021-07-05 Method, device, terminal and medium for deduplication processing of external files in various formats

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110758043.6A CN113568897A (en) 2021-07-05 2021-07-05 Method, device, terminal and medium for deduplication processing of external files in various formats

Publications (1)

Publication Number Publication Date
CN113568897A true CN113568897A (en) 2021-10-29

Family

ID=78163690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110758043.6A Pending CN113568897A (en) 2021-07-05 2021-07-05 Method, device, terminal and medium for deduplication processing of external files in various formats

Country Status (1)

Country Link
CN (1) CN113568897A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1666196A (en) * 2002-05-10 2005-09-07 甲骨文国际公司 Method and mechanism of storing and accessing data and improving performance of database query language statements
CN101496007A (en) * 2006-07-20 2009-07-29 S.I.Sv.El.有限公司 Automatic management of digital archives, in particular of audio and/or video files
CN102884528A (en) * 2010-05-13 2013-01-16 微软公司 Decreasing duplicates and loops in an activity record
CN102905002A (en) * 2012-10-31 2013-01-30 广东欧珀移动通信有限公司 Method and system for automatically combining contact items
CN104077380A (en) * 2014-06-26 2014-10-01 深圳信息职业技术学院 Method and device for deleting duplicated data and system
CN110942355A (en) * 2019-12-13 2020-03-31 北京搜狐新媒体信息技术有限公司 Advertisement duplicate removal method and device
CN111459886A (en) * 2020-03-12 2020-07-28 苏州浪潮智能科技有限公司 Log content matching retrieval method, device, equipment and storage medium
CN112181964A (en) * 2020-09-28 2021-01-05 中国建设银行股份有限公司 Business notification duplicate removal method, device, server and storage medium
CN112232046A (en) * 2019-07-15 2021-01-15 珠海金山办公软件有限公司 Method and device for displaying repeated items of table

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1666196A (en) * 2002-05-10 2005-09-07 甲骨文国际公司 Method and mechanism of storing and accessing data and improving performance of database query language statements
CN101496007A (en) * 2006-07-20 2009-07-29 S.I.Sv.El.有限公司 Automatic management of digital archives, in particular of audio and/or video files
CN102884528A (en) * 2010-05-13 2013-01-16 微软公司 Decreasing duplicates and loops in an activity record
CN102905002A (en) * 2012-10-31 2013-01-30 广东欧珀移动通信有限公司 Method and system for automatically combining contact items
CN104077380A (en) * 2014-06-26 2014-10-01 深圳信息职业技术学院 Method and device for deleting duplicated data and system
CN112232046A (en) * 2019-07-15 2021-01-15 珠海金山办公软件有限公司 Method and device for displaying repeated items of table
CN110942355A (en) * 2019-12-13 2020-03-31 北京搜狐新媒体信息技术有限公司 Advertisement duplicate removal method and device
CN111459886A (en) * 2020-03-12 2020-07-28 苏州浪潮智能科技有限公司 Log content matching retrieval method, device, equipment and storage medium
CN112181964A (en) * 2020-09-28 2021-01-05 中国建设银行股份有限公司 Business notification duplicate removal method, device, server and storage medium

Similar Documents

Publication Publication Date Title
CN106990950B (en) Interface layout method and device
CN100456303C (en) File delete method and file open method
CN111125169B (en) Data query method, control device, control medium and computer equipment
US9652446B2 (en) Automatically adjusting spreadsheet formulas and/or formatting
CN109299157B (en) Data export method and device for distributed big single table
CN111008188A (en) Data migration method and device, computer equipment and storage medium
CN108573019B (en) Data migration method and device, electronic equipment and readable storage medium
CN109614371B (en) Method, device, computer equipment and storage medium for storing information
CN107943466B (en) Database access statement generation method, device and equipment
CN115438087B (en) Data query method, device, storage medium and equipment based on cache library
CN114036238A (en) Data synchronization method, device, equipment and storage medium
CN106844526B (en) document processing method and device
CN113568897A (en) Method, device, terminal and medium for deduplication processing of external files in various formats
CN111858581A (en) Page query method and device, storage medium and electronic equipment
CN112417830A (en) Document page management method and device, electronic equipment and storage medium
CN116303338A (en) Data migration method and device
CN112818021A (en) Data request processing method and device, computer equipment and storage medium
JP2002366396A (en) System and program for automatically collecting fault analysis information
CN112667463A (en) Application system baseline state monitoring method and device
CN111158644A (en) Prototype graph and interface interaction method and device
CN110515553A (en) A kind of volume delet method and equipment
CN115455010B (en) Data processing method based on milvus database, electronic equipment and storage medium
CN114840356B (en) Data processing method, data processing system and related device
CN110019019B (en) Data file management method, device, equipment and computer readable storage medium
CN112579237B (en) Event response method, device, equipment and medium based on MVVM mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination