CN117891786B - File path hooking method and system based on Monte Carlo algorithm - Google Patents

File path hooking method and system based on Monte Carlo algorithm Download PDF

Info

Publication number
CN117891786B
CN117891786B CN202410295135.9A CN202410295135A CN117891786B CN 117891786 B CN117891786 B CN 117891786B CN 202410295135 A CN202410295135 A CN 202410295135A CN 117891786 B CN117891786 B CN 117891786B
Authority
CN
China
Prior art keywords
level
folder
hooking
folder name
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410295135.9A
Other languages
Chinese (zh)
Other versions
CN117891786A (en
Inventor
傅健
胡明兴
姜集敏
罗智
黄露
张的
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Yantong Information Technology Co ltd
Original Assignee
Zhejiang Yantong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Yantong Information Technology Co ltd filed Critical Zhejiang Yantong Information Technology Co ltd
Priority to CN202410295135.9A priority Critical patent/CN117891786B/en
Publication of CN117891786A publication Critical patent/CN117891786A/en
Application granted granted Critical
Publication of CN117891786B publication Critical patent/CN117891786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an archive path hooking method and system based on a Monte Carlo algorithm, wherein the method comprises the following steps: acquiring file history data, and performing hierarchical folder name sampling according to the file history data to obtain a current-level folder name and a first sampling data set hung by a previous-level folder name; calculating probability distribution of each current-level folder name and previous-level folder name hooking according to the first sampling data set; judging the folder names and file levels in the hanging range according to probability distribution of the current-level folder names and the previous-level folder names; and after the target file to be hooked is hooked in the simulated hooking range, calculating the total probability value of all simulated hooking paths of the target file to be hooked, and taking the path with high total probability value as the final target hooking path.

Description

File path hooking method and system based on Monte Carlo algorithm
Technical Field
The invention relates to the technical field of file hanging, in particular to a file path hanging method and system based on a Monte Carlo algorithm.
Background
In the prior art, the file path hanging needs to automatically match the path name of each level, and then the corresponding file is hung in the file database of the corresponding level path. The existing technical scheme of file hanging mainly adopts a text similarity algorithm to match, namely, different levels of folder names of files to be hung are obtained, similarity calculation is carried out on the different levels of folder names of the files to be hung and path names of different levels in a file database, and a file path with the maximum similarity value is taken as a path to be hung. However, the similarity algorithm may have the following problems: 1. a large error may occur in simple text similarity, for example, files of objects of the same name and file objects of regions may be mishooked. 2. The similarity algorithm is only a calculation on simple text probability, the actual file hooking also relates to hooking association probability between specific application scenes or different levels, and the text similarity does not relate to probability calculation of specific application scenes and hooking association probability calculation between different levels.
Disclosure of Invention
According to the file path hooking method and system based on the Monte Carlo algorithm, random sampling is carried out on the folder names of each level through the Monte Carlo algorithm, probability distribution of the folder names of each level relative to the folder names of the previous level is calculated, association degrees among the folder names of different levels are solved according to the probability distribution, and the optimal path for hooking the folder names to be hooked is solved according to the association degrees.
Another object of the present invention is to provide a file path hooking method and system based on a monte carlo algorithm, where the method and system configure a scene association degree calculation based on a probability distribution of the current folder name relative to a folder name of a previous level, calculate a sum association degree based on the scene association degree and the hooking association degrees of folder names of upper and lower levels, and solve an optimal hooking path according to the sum association degree.
Another object of the present invention is to provide a method and a system for file path hooking based on the monte carlo algorithm, where the method and the system sample data according to hooking texts of existing history files in different scenes, obtain a probability distribution function of a folder name of a level above a current folder name from the history files, and solve probability distribution of all paths under simulated hooking, so as to obtain an optimal file hooking scheme of full path, thereby improving accuracy of file hooking.
In order to achieve at least one of the above objects, the present invention further provides a file path hooking method based on a monte carlo algorithm, the method comprising the steps of:
acquiring file history data, and performing hierarchical folder name sampling according to the file history data to obtain a current-level folder name and a first sampling data set hung by a previous-level folder name;
calculating probability distribution of each current-level folder name and previous-level folder name hooking according to the first sampling data set;
judging the folder names and file levels in the hanging range according to probability distribution of the current-level folder names and the previous-level folder names;
and after the target file to be hooked is hooked in the simulated hooking range, calculating the total probability value of all simulated hooking paths of the target file to be hooked, and taking the path with high total probability value as the final target hooking path.
According to another preferred embodiment of the present invention, the hierarchical folder name sampling method includes: and acquiring the current folder name in the file history data, marking the level where the sampled current folder name is located to obtain a current level folder name M n and current marking level data n, wherein n is more than or equal to 1, acquiring the last level marking level data n-1 and the folder name M n-1 corresponding to the last level according to the corresponding marking level data, and combining the last level marking level n-1, the last level folder name M n-1, the current level folder name M n and the current marking level data n to be used as samples [ (M n-1,n-1);(Mn, n) ] of the first sampling data set.
According to another preferred embodiment of the present invention, the probability distribution calculation method for hooking the current folder name and the folder name of the previous hierarchy includes: and obtaining a current-level folder name M n and a last-level folder name M n-1 according to the first sampling data set, calculating the frequency of the same combined folder name in the first sampling data set according to the current folder name M n and the last-level folder name M n-1, and taking the ratio of the frequency to the total number of samples as the probability value of the sample points of the current folder name M n and the last-level folder name M n-1 in the first sampling data set.
According to another preferred embodiment of the present invention, the current folder name M n is defined as M s in the other previous-level combined folder names in the first dataset sample, the frequency of occurrence of the corresponding samples of all the other previous-level combined folder names M s in the first sampled dataset is calculated, the probability distribution of the current folder name M n and all the folder names in the previous level is obtained according to the ratio of the frequency to the total number of samples, a probability distribution range is set, and the target file hooking simulation is performed in the probability distribution range.
According to another preferred embodiment of the present invention, the method for calculating the probability distribution of the hooking includes: and acquiring corresponding current folder names M n and corresponding labeling level data n according to the first sampling data set, wherein the labeling level data n is a target level, acquiring a combined folder name sample with the current folder name M n and all previous levels n-1 of the first sampling data set according to the labeling level data n, calculating the frequency of occurrence of the corresponding combined folder names in the target level n, calculating the level probability distribution of the combined folder names corresponding to the target level n according to the ratio of the frequency of occurrence of the corresponding combined folder names in the target level n to the total number of samples, and further acquiring the optimal previous-level folder name of the corresponding current folder name M n in the current target level n according to the level probability distribution to serve as a target hanging folder simulation hanging.
According to another preferred embodiment of the present invention, a text similarity algorithm is used to calculate similarity values of file names or folder names of the same level in the simulated hanging paths, and the hanging paths of the target files or folders to be hung are screened according to the similarity values.
According to another preferred embodiment of the present invention, in the process of sampling the archive history data, different sampling scene libraries are configured, the archive history data distributes a first sampling data sample to a corresponding sampling scene library according to a data source, the frequency of each level of current folder names appearing in the sampling scene library is calculated, the scene probability distribution of each folder name is calculated according to the frequency of appearing in the sampling scene library and the total number of samples in the scene library, and the scene association degree of the folder names is judged according to the scene probability distribution of the folder names, so as to automatically judge the hanging scenes of the folders to be hung on by the screening target.
According to another preferred embodiment of the present invention, when the target folder to be hooked performs simulated hooking, a hooking probability value of each level folder name and a previous level folder name after simulated hooking is calculated, a scene probability value of the target folder name is obtained, a total hooking probability value of the target folder is obtained by weighting and summing the hooking probability values of all levels and the scene probability values, and a corresponding level and a corresponding folder name with the highest total hooking probability value are taken as the final hooking folder of the target folder to be hooked.
In order to achieve at least one of the above objects, the present invention further provides a file path hooking system based on the monte carlo algorithm, which performs the above file path hooking method based on the monte carlo algorithm.
The present invention further provides a computer readable storage medium storing a computer program that is executed by a processor to implement the above-described archive path hooking method based on the monte carlo algorithm.
Drawings
Fig. 1 is a flow chart of a file path hooking method based on a monte carlo algorithm according to the present invention.
Fig. 2 shows a diagram of the implementation steps of the monte carlo algorithm according to the present invention.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art. The basic principles of the invention defined in the following description may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It will be understood that the terms "a" and "an" should be interpreted as referring to "at least one" or "one or more," i.e., in one embodiment, the number of elements may be one, while in another embodiment, the number of elements may be plural, and the term "a" should not be interpreted as limiting the number.
Referring to fig. 1-2, the invention discloses a file path hooking method and system based on a monte carlo algorithm, wherein the method mainly comprises the following steps: firstly, historical archive data needs to be acquired, wherein the historical archive data comprises but is not limited to archive data of different application scenes, the historical archive data is sampled according to a Monte Carlo algorithm, and the sampling content comprises a current archive folder name, a last-level folder name where the current archive folder is located, a scene where the current archive folder name is located and the like. The method further marks the file interlayer level where the current folder name is located for the historical archive data sampling, so as to obtain the marked level of the current folder name, and constructs a first sampling data set by the folder name, the marked level, the previous-level folder name and the previous marked level, and the first sampling data set is utilized to carry out data analysis of the Monte Carlo algorithm, so that the association relation of probability distribution of different-level folder names and the previous-level folder names in the first sampling data set is judged. And further calculating scene probability distribution of the folder name sources of each level by using the Monte Carlo algorithm to obtain the association relationship between the corresponding folder names and the scenes. Furthermore, the invention combines the association relation between the current folder name and the folder name of the last level and the scene association relation of the folder names to carry out simulated hanging, calculates the total probability value after simulated hanging, and selects the file path with the highest total probability value to carry out hanging. The technical scheme of the invention can effectively solve the problem that the simple text similarity algorithm brings larger hooking deviation, and the accuracy of hooking the folder or the file to be hooked is improved by adding the scene probability of the data source into the hooking algorithm of the document.
Specifically, the first sampled data set construction method in the invention comprises the following steps: and sampling and acquiring storage path information of each folder from the archive history data, wherein the storage path information comprises folder names or file names, the folder names are defined as M n, n represents hierarchical data where the folder names are located, the lowest layer of the folder names M n can be regarded as the file names, and the file names are uniformly defined as the folder names M n for convenience of description. After the folder names are acquired, judging the storage level of the current folder names according to the acquired folder storage paths, wherein the stored folder levels can be sequentially acquired through separation numbers in the sampled file storage paths. For example: the file path defining the sample is: hangzhou medical archives/money pool/sanded streets/XX hospitals/XX departments/XXX patient diagnostic reports. The file names or the file names obtained according to the interval characters/samples in the file path are as follows in sequence: hangzhou medical files; a money pool area; setting down a sandy street; XX hospital; the XX department and XXX patient diagnosis report 6 sample folder names M n or file names, and each folder name is further hierarchically labeled after the 6 sample folder names M n are obtained. In one preferred embodiment of the present invention, the method for labeling hierarchy includes: in the sampling process, each character in the file path is sequentially read, the reading sequence of the interval characters is judged, the folder names M n obtained by sampling before the interval characters are sequentially marked in a hierarchical manner according to the reading sequence of the interval characters, and the sampled file path is taken as an example: hangzhou medical archives/money pool/sanded streets/XX hospitals/XX departments/XXX patient diagnostic reports, wherein the first sequence of interval characters is read as: hangzhou medical files/, thus dividing the sampled "Hangzhou medical files" into level 1, at this time, "Hangzhou medical record" and hierarchical data "1" are obtained. Reading interval characters of the second sequence and the third sequence in the same way, and sequentially aiming at the money pool area; setting down a sandy street; XX hospital; the XX department and XXX patient diagnostic reports are hierarchically annotated. And obtaining the hierarchical annotation data of each folder name in the complete sampling file path. It should be noted that, in the present invention, the folder name extraction and the corresponding hierarchical data division are only illustrative, and in other preferred embodiments of the present invention, the interval characters may be replaced by folder hierarchical division character rules of other specifications, such as "etc., which are not described in detail herein.
After obtaining the different folder names and the corresponding hierarchical data in the sampling file path, obtaining the current hierarchical folder name M n and the hierarchical annotation data n corresponding to the folder name, and further obtaining the previous hierarchical folder name M n-1 according to the hierarchical annotation data n, where n is an integer greater than or equal to 1. At this time, the sampled current-level folder name M n and the corresponding-level annotation data n, and the previous-level folder name M n-1 and the corresponding-level annotation n-1 are used to obtain the sample individual [ (M n-1,n-1);(Mn, n) ] of the following first sampling dataset. Wherein when the n=1, the current-level folder name M 1 is the maximum folder name, and the current-level folder name M 1 does not have the previous-level folder, so that the sample individual of the first sample dataset may be configured as [0; (M n, n) ], with a value of 0 indicating absence. The above file paths are specifically exemplified: obtaining a first sampling data set sample individual according to the sampling rules including, but not limited to [ (Hangzhou medical file, 1); (money pool area, 2) ].
Further, the present invention also needs to configure a corresponding sampling scenario library, where different sampling scenario libraries are derived from different sampling scenarios, for example, defining the configured sampling scenario includes but is not limited to: according to the method, corresponding scenes can be obtained according to historical source data of each folder, ip addresses of data historical sources in each level folder can be collected, scene types of the ip addresses are analyzed to obtain corresponding file sampling scenes, sampling scene probability of names of the folders of each level is calculated, F x is defined as sampling scene probability values of the names of the corresponding folders, F is corresponding different scene probability distribution, x represents types of sampling scenes, frequency of the names of the folders of each level in the sampling scenes is calculated, and the ratio of the frequency to the total number of samples is calculated to serve as the scene probability distribution corresponding to the names of the folders. Each scene is a discrete probability distribution point corresponding to the folder name, and scene association degree calculation is further carried out according to the scene probability distribution. For example, if the address source of the XX hospital/XX department/XXX patient diagnosis report folder ip is a medical system, the ratio of the frequency of occurrence of the medical system scene corresponding to the folder name to the total number of samples is the medical system scene probability value f x of the folder name.
It is worth mentioning that the present invention is a current-level folder name constructed based on the samples of the first sampling dataset, the current-level folder name is a level annotation data, the previous-level folder name and the previous-level folder annotation data are subjected to frequency calculation according to the monte-carlo algorithm, and [ (M n-1,n-1);(Mn, n) ] e P is defined as the randomly sampled first sampling dataset. In one preferred embodiment of the present invention, the frequency of each current-level folder name M n and the corresponding sample type in the first sample data set is calculated, and the ratio of the frequency of the number of all sample types corresponding to M n in the first sample data set to the total number of samples in the first sample data set is further calculated as the probability distribution of the current-level folder name M n with different sample types. It should be noted that, in the present invention, the sample type judgment method includes: and acquiring the current-level folder name M n, acquiring two folder names of the folder name M n-1 of one level on the current-level folder name M n to form a combined folder name (M n,Mn-1), judging whether the sequence and the name of each folder name in the combined folder name (M n,Mn-1) are the same, and if so, taking the same sample individual of the combined folder name (M n,Mn-1) as the same sample. The frequency calculation is obtained by calculating according to the same type of sample of the folder name M n of the current level. When the frequency of different sample types of the folder names M n of the front hierarchy is obtained, the frequency of the different sample types and the total sample number are compared to obtain probability distribution of the different sample types, other folder names M s are defined, probability distribution of the sample types corresponding to all other folder names M s can be calculated at the moment, wherein the sample type judging method of the other folder names M s is the same as the sample type judging method, the corresponding combined folder names are obtained, the samples of the same combined folder names are classified into the same sample type, after the probability distribution of all other folder names M s is obtained, the corresponding probability distribution range is set, wherein after the probability is sorted from big to small, the sample type with a certain proportion of larger probability is selected as the corresponding probability distribution range.
In another preferred embodiment of the present invention, the current combined folder name (M n,Mn-1) is defined, the other combined folder names (M s,Ms-1) are defined, and the text contents of the current combined folder name (M n,Mn-1) may have a certain difference, so the present invention may also calculate the text similarity between the combined folder name (M n,Mn-1) and the other combined folder name (M s,Ms-1) based on a text similarity algorithm, respectively, and set a text similarity threshold, and when the text similarity between the other combined folder name (M s,Ms-1) and the current combined folder name (M n,Mn-1) is greater than the text similarity threshold, classify the samples of the other combined folder names (M s,Ms-1) as the same samples of the current combined folder name (M n,Mn-1), and add one to the frequency of the current combined folder name (M n,Mn-1).
In one preferred embodiment of the present invention, after the present invention further screens out sample data of different hooking types of all folder names M n based on the probability distribution range, the probability distribution calculation of the corresponding folder name M n on the corresponding hooking level is further performed on the screened data, and the specific method includes the following steps: screening the first sampling data set according to the corresponding probability distribution range to obtain a sub-sampling data set, obtaining each folder name M n and a corresponding level n in the value sampling data set, and judging the probability distribution of the folder name M n-1 corresponding to the previous level n-1 in the level sample type according to the marked level data n, wherein the calculation mode of the level probability distribution is as follows: obtaining all sample total numbers y of a target level n, wherein the combined folder name of a corresponding folder name M n is (M n,Mn-1), calculating different combined folder names (M n,Mn-1) from all sample total numbers y of the target level n to be used as samples of different sample types, calculating the frequency of each sample type, taking the ratio of the frequency of each sample type to the total number y of all samples of the target level n as probability distribution Q of the folder name M n under the target level n, defining the corresponding Q as probability values of a designated file name and a designated file name of the last level, and Q epsilon Q. And selecting one or more layers of folders with highest probability to carry out simulated hooking.
Further, the method for simulating the hooking comprises the following steps: obtaining a folder R to be hooked, obtaining the folder name of the folder R to be hooked, performing simulated hooking according to the screened target folder name and the target level folder, and calculating the total probability of the folder to be hooked in a corresponding file path, wherein the total probability calculation method comprises the following steps: calculating the hooking probability value q of each level folder name and the last level folder name after the simulated hooking, obtaining the scene probability value f x of the target hooking folder name, setting the weight values alpha 1 and sigma 2 in a distribution mode, obtaining the total hooking probability value Z=alpha 1*q+σ2*fx of the target hooking folder by weighting and summing the hooking probability values of all levels and the scene probability values, and obtaining the corresponding level of the highest Z of the total hooking probability value and the corresponding folder name as the final hooking folder of the target hooking folder.
The processes described above with reference to flowcharts may be implemented as computer software programs in accordance with the disclosed embodiments of the application. Embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU). The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wire segments, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be understood by those skilled in the art that the embodiments of the present invention described above and shown in the drawings are merely illustrative and not restrictive of the current invention, and that this invention has been shown and described with respect to the functional and structural principles thereof, without departing from such principles, and that any modifications or adaptations of the embodiments of the invention may be possible and practical.

Claims (7)

1. An archive path hooking method based on a Monte Carlo algorithm is characterized by comprising the following steps:
acquiring file history data, and performing hierarchical folder name sampling according to the file history data to obtain a first sampling data set of each current-level folder name, a corresponding previous-level folder name and corresponding level annotation data hanging;
calculating probability distribution of each current-level folder name and previous-level folder name hooking according to the first sampling data set;
Calculating the similarity value of the file names or the folder names of the same level in the sampled historical data of the target file names or the folder names to be hooked by adopting a text similarity algorithm, and screening a simulated hooking path containing the target file or the folder to be hooked from the sampled historical data according to the similarity value;
judging the folder names and file levels in the hanging range according to the probability distribution of each current-level folder name and the previous-level folder name;
after the target file to be hooked is hooked in the simulated hooking range, calculating the total probability value of all simulated hooking paths of the target file to be hooked, and taking the path with the highest total probability value as the final target hooking path;
In the process of sampling the archive history data, configuring different sampling scene libraries, distributing the archive history data into corresponding sampling scene libraries according to data sources, calculating the frequency of each current-level folder name in the sampling scene libraries, calculating scene probability distribution of each folder name according to the frequency of the sampling scene libraries and the total number of samples in the scene libraries, judging scene association degree of the folder name according to the scene probability distribution of the folder name, and automatically judging the hanging scenes of the folder to be hung by a screening target;
when the target folder to be hooked carries out simulated hooking, calculating the hooking probability value of the name of each current-level folder and the name of the folder of the last level after simulated hooking, obtaining the scene probability value of the name of the target folder to be hooked, obtaining the total hooking probability value of the target folder by weighting and summing the hooking probability values of all levels and the scene probability value, and obtaining the corresponding level with the highest total hooking probability value and the name of the corresponding folder as the final hooking folder of the target folder to be hooked.
2. The archive path hooking method based on the Monte Carlo algorithm according to claim 1, wherein the hierarchical folder name sampling method comprises: and acquiring the current folder name in the file history data, marking the level where the sampled current folder name is located, obtaining a current level folder name M n and current marking level data n, wherein n is more than or equal to 1, acquiring the marking level data n-1 of the previous level and the folder name M n-1 corresponding to the previous level according to the corresponding marking level data, and combining the marking level n-1 of the previous level, the name M n-1 of the previous level, the name M n of each current level folder and the current marking level data n to be used as a sample [ (M n-1,n-1);(Mn, n) ] of the first sampling data set.
3. The archive path hooking method based on the Monte Carlo algorithm according to claim 1, wherein the probability distribution calculation method for hooking each current-level folder name and the previous-level folder name comprises: and obtaining the current-level folder name M n and the last-level folder name M n-1 according to the first sampling data set, calculating the frequency of the same combined folder name in the first sampling data set according to the current-level folder name M n and the last-level folder name M n-1, and taking the ratio of the frequency to the total number of samples as the probability value of the sample points of the current-level folder name M n and the last-level folder name M n-1 in the first sampling data set.
4. A file path hooking method based on the monte carlo algorithm according to claim 3, wherein each current level folder name M n is defined as M s, the frequency of occurrence of the corresponding samples of all other previous level combined folder names M s in the first sample data set is calculated, the probability distribution of each current level folder name M n and all folder names in the previous level is obtained according to the ratio of the frequency to the total number of samples, a probability distribution range is set, and the hooking simulation of the target file is performed within the probability distribution range.
5. The archive path hooking method based on the Monte Carlo algorithm according to claim 4, wherein the hooking probability distribution calculating method comprises: and acquiring a corresponding current-level folder name M n and corresponding labeling-level data n according to the first sampling data set, wherein the labeling-level data n is a target level, acquiring a combined folder name sample with the current-level folder name M n and all previous-level n-1 in the first sampling data set according to the labeling-level data n, calculating the frequency of occurrence of the corresponding combined folder name in the target level n, calculating the level probability distribution of the combined folder name corresponding to the target level n according to the ratio of the frequency of occurrence of the corresponding combined folder name in the target level n to the total number of samples, and further acquiring the optimal previous-level folder name of each current-level folder name M n in the current target level n according to the level probability distribution to serve as a target hooking folder to simulate hooking.
6. A monte carlo algorithm based file path hooking system, wherein the system performs a monte carlo algorithm based file path hooking method according to any one of claims 1 to 5.
7. A computer readable storage medium storing a computer program for execution by a processor to implement a monte carlo algorithm based archive path hooking method of any one of claims 1 to 5.
CN202410295135.9A 2024-03-15 2024-03-15 File path hooking method and system based on Monte Carlo algorithm Active CN117891786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410295135.9A CN117891786B (en) 2024-03-15 2024-03-15 File path hooking method and system based on Monte Carlo algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410295135.9A CN117891786B (en) 2024-03-15 2024-03-15 File path hooking method and system based on Monte Carlo algorithm

Publications (2)

Publication Number Publication Date
CN117891786A CN117891786A (en) 2024-04-16
CN117891786B true CN117891786B (en) 2024-05-31

Family

ID=90652069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410295135.9A Active CN117891786B (en) 2024-03-15 2024-03-15 File path hooking method and system based on Monte Carlo algorithm

Country Status (1)

Country Link
CN (1) CN117891786B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108762979A (en) * 2018-04-17 2018-11-06 厦门市美亚柏科信息股份有限公司 A kind of end message backup method and alternate device based on matching tree
CN111414331A (en) * 2020-03-26 2020-07-14 北京字节跳动网络技术有限公司 Document importing method and device of online collaborative knowledge base, storage medium and equipment
CN111444144A (en) * 2020-03-04 2020-07-24 奇安信科技集团股份有限公司 File feature extraction method and device
CN115277677A (en) * 2022-07-29 2022-11-01 招商局金融科技有限公司 Batch archive hooking method and device, computer equipment and storage medium
CN115329168A (en) * 2022-04-24 2022-11-11 永中软件股份有限公司 Java-based batch hooking method for file original texts and file entries
CN115392845A (en) * 2022-06-13 2022-11-25 杭州京胜航星科技有限公司 File hanging management method and system based on file intellectualization
CN115794742A (en) * 2022-12-14 2023-03-14 百度在线网络技术(北京)有限公司 File path data processing method, device, equipment and storage medium
CN116150092A (en) * 2023-03-01 2023-05-23 重庆傲雄在线信息技术有限公司 Method, system, equipment and medium for quick verification of electronic archive file
CN116644035A (en) * 2023-07-21 2023-08-25 中邮消费金融有限公司 File batch warehousing method, device, equipment and storage medium
CN116821050A (en) * 2023-05-16 2023-09-29 深圳市雁联计算***有限公司 Method, system and storage medium for automatically and batched hooking of archive electronic files

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2634223C2 (en) * 2015-06-30 2017-10-24 Общество С Ограниченной Ответственностью "Яндекс" Method (optional) and system (optional) for management of data associated with hierarchical structure

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108762979A (en) * 2018-04-17 2018-11-06 厦门市美亚柏科信息股份有限公司 A kind of end message backup method and alternate device based on matching tree
CN111444144A (en) * 2020-03-04 2020-07-24 奇安信科技集团股份有限公司 File feature extraction method and device
CN111414331A (en) * 2020-03-26 2020-07-14 北京字节跳动网络技术有限公司 Document importing method and device of online collaborative knowledge base, storage medium and equipment
CN115329168A (en) * 2022-04-24 2022-11-11 永中软件股份有限公司 Java-based batch hooking method for file original texts and file entries
CN115392845A (en) * 2022-06-13 2022-11-25 杭州京胜航星科技有限公司 File hanging management method and system based on file intellectualization
CN115277677A (en) * 2022-07-29 2022-11-01 招商局金融科技有限公司 Batch archive hooking method and device, computer equipment and storage medium
CN115794742A (en) * 2022-12-14 2023-03-14 百度在线网络技术(北京)有限公司 File path data processing method, device, equipment and storage medium
CN116150092A (en) * 2023-03-01 2023-05-23 重庆傲雄在线信息技术有限公司 Method, system, equipment and medium for quick verification of electronic archive file
CN116821050A (en) * 2023-05-16 2023-09-29 深圳市雁联计算***有限公司 Method, system and storage medium for automatically and batched hooking of archive electronic files
CN116644035A (en) * 2023-07-21 2023-08-25 中邮消费金融有限公司 File batch warehousing method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Distributed Hybrid-Storage Partially Mountable File System;Radovici, Alexandru等;《 PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES》;20090101;全文 *
档案缩微品扫描图像文件自动分类和自动挂接方法研究;袁庆华;王向东;邵荣平;马运虎;韦斌;;数字与缩微影像;20100615(第02期);全文 *
高校纸质档案数字图像存储与数据挂接模式探索——以中国矿业大学档案馆为例;李月娥;《档案与建设》;20190520(第5期);全文 *

Also Published As

Publication number Publication date
CN117891786A (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN106919957B (en) Method and device for processing data
CN114697128B (en) Big data denoising method and big data acquisition system through artificial intelligence decision
US9489414B2 (en) Prefix burrows-wheeler transformations for creating and searching a merged lexeme set
Stadler et al. Estimating speciation and extinction rates for phylogenies of higher taxa
CN112365171A (en) Risk prediction method, device and equipment based on knowledge graph and storage medium
Kavak et al. Discovery and genotyping of novel sequence insertions in many sequenced individuals
CN110674360A (en) Method and system for constructing data association graph and tracing data
CN114490404A (en) Test case determination method and device, electronic equipment and storage medium
CN106201857B (en) The choosing method and device of test case
CN109088793B (en) Method and apparatus for detecting network failure
Lemant et al. Robust, universal tree balance indices
CN110891071A (en) Network traffic information acquisition method, device and related equipment
CN117891786B (en) File path hooking method and system based on Monte Carlo algorithm
CN114492590A (en) Boundary channel generation method and device based on track clustering
CN111325255B (en) Specific crowd delineating method and device, electronic equipment and storage medium
CN110688437B (en) Method, device and equipment for dividing geographical area and storage medium
CN116910650A (en) Data identification method, device, storage medium and computer equipment
CN111797772A (en) Automatic invoice image classification method, system and device
CN111582313A (en) Sample data generation method and device and electronic equipment
CN115952150A (en) Multi-source heterogeneous data fusion method and device
CN114816518A (en) Simhash-based open source component screening and identifying method and system in source code
US20180365378A1 (en) Stable genes in comparative transcriptomics
CN111430013A (en) Method, device and equipment for complementing image date and storage medium
Wang et al. DFHiC: a dilated full convolution model to enhance the resolution of Hi-C data
CN112182069B (en) Agent retention prediction method, agent retention prediction device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant