CN117370624A - Electronic bill processing method and system - Google Patents

Electronic bill processing method and system Download PDF

Info

Publication number
CN117370624A
CN117370624A CN202311638698.5A CN202311638698A CN117370624A CN 117370624 A CN117370624 A CN 117370624A CN 202311638698 A CN202311638698 A CN 202311638698A CN 117370624 A CN117370624 A CN 117370624A
Authority
CN
China
Prior art keywords
filter
processed
document
bit
counting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311638698.5A
Other languages
Chinese (zh)
Inventor
罗宁
吴泽朝
张学锋
王亚南
李克善
邓高明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sifang Qidian Technology Co ltd
Original Assignee
Beijing Sifang Qidian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sifang Qidian Technology Co ltd filed Critical Beijing Sifang Qidian Technology Co ltd
Priority to CN202311638698.5A priority Critical patent/CN117370624A/en
Publication of CN117370624A publication Critical patent/CN117370624A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data processing, and provides a method and a system for processing an electronic bill. The electronic bill processing method comprises the following steps: constructing an initial filter, wherein the initial filter is constructed based on a standard filter and a dynamic counting linked list; the standard filter is stored by adopting a bit array structure, and the dynamic counting linked list is stored by adopting a linked list, wherein the bit positions of a plurality of elements possibly inserted into the standard filter bit array and the number of the elements possibly inserted into the standard filter bit array are stored; inserting the checked heavy bill map stored by the current processing platform into the initial filter to obtain a target filter; responding to a receipt duplication checking request sent by a user, and checking and duplication of the to-be-processed receipt carried in the receipt duplication checking request by utilizing the target filter. According to the invention, the target filter is utilized to check the duplicate of the electronic list, so that the duplicate checking efficiency can be improved.

Description

Electronic bill processing method and system
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a method and a system for processing an electronic bill.
Background
With the development of computer technology, any electronic document is processed on an internet platform for providing online service for a plurality of household main bodies in a cloud service mode, and repeated processing is avoided in the processing process.
The existing duplicate checking method mainly uses a database system to inquire whether a newly submitted electronic bill is recorded or not, and if not, the newly submitted electronic bill is continuously processed; and if the processing record of the electronic bill is queried in the database, reminding a sponsor to prevent repeated processing.
However, the number of electronic documents submitted by all users on each line is hundreds of thousands, and based on the electronic documents with the number scale, the weight judgment is carried out by adopting a traditional database when the documents to be processed are submitted, so that the weight judgment is required to be searched from hundreds of millions of data stored in one year, and the weight judgment efficiency is low.
Disclosure of Invention
In order to solve at least one technical problem in the background technology, the invention provides an electronic bill processing method and system, which are used for realizing quick weight judgment of electronic bills by constructing a target filter aiming at the electronic bills such as electronic certificates, travel subsidy bills and the like, can greatly reduce the times that the database needs to be searched during the existing weight judgment, and remarkably improves the weight searching efficiency.
According to a first aspect of the present invention, there is provided a method of processing an electronic document, comprising:
constructing an initial filter, wherein the initial filter is constructed based on a standard filter and a dynamic counting linked list; the standard filter is stored by adopting a bit array structure, and parameters of the standard filter meet the following formula:
Wherein k represents the number of hash functions, p represents the false alarm rate of a standard filter, m represents the bit array length, and n represents the data size of the current processing platform;
the dynamic counting linked list satisfies the following conditions: when the number x of the inserted elements at a certain bit position in the standard filter bit array is more than or equal to 2, keeping the current bit value as 1, and adding a counter node < i, x > in the current dynamic counting chain table for identifying that the ith position of the bit array has inserted x data;
inserting the currently stored checked and re-documented mapping into the initial filter to obtain a target filter;
responding to a receipt duplication checking request sent by a user, and checking and duplication of the to-be-processed receipt carried in the receipt duplication checking request by utilizing the target filter.
Further, the currently stored scanned document map is inserted into the initial filter, and a target filter is obtained;
selecting a corresponding number of initial filters according to the number of types of the currently stored checked heavy documents;
and inserting the checked and re-documented mapping of the same type into the same standard filter based on the dynamic counting linked list to obtain corresponding target filters, wherein each target filter represents a documentary set of one type.
Further, the responding to the document duplication checking request sent by the user, and performing duplication checking processing on the document to be processed carried in the document duplication checking request by using the target filter, includes:
carrying out hash operation on the document to be processed by using the target filter to generate a corresponding hash fingerprint;
judging whether the values of k bit positions in the target filter bit array mapped by the hash fingerprint are all 1 or not;
if the values of k bit positions in the bit array are not all 1, indicating that the document to be processed is not repeatedly processed, continuing to execute the document processing flow; or,
and if the values of the k bit positions of the bit array are all 1, which indicates that the document to be processed is possibly processed repeatedly, continuing to perform secondary check on the document to be processed.
Further, the document to be processed is not repeatedly processed, and the document processing flow is continuously executed, including:
setting the value of the bit position with the current value of 0 in the k bit positions to be 1; meanwhile, in the k bit positions, if the corresponding counting node exists in the dynamic counting linked list, the current counting value of the counting node is increased by 1; if the dynamic counting linked list does not have the corresponding counting node, inserting the corresponding counting node and setting the counting value to be 2; and storing the document to be processed and the corresponding hash fingerprint into a checked database table.
Further, the continuing to perform secondary check on the document to be processed includes:
searching the document to be processed based on the checked database table;
if the document to be processed is retrieved from the checked duplicate database table, the document to be processed is indicated to be repeatedly processed, and the processed information of the document to be processed is returned to the user at the moment; or if the document to be processed is not retrieved in the checked database table, the document to be processed is not repeatedly processed, and the document processing flow is continuously executed at the moment.
Further, the electronic document processing method further includes:
deleting the bill processing record in the target filter and the checked database table according to the bill processing record revocation request sent by the user;
the deleting the document processing record in the target filter and the checked database table comprises the following steps:
carrying out hash operation on bill data carried in the withdrawn bill processing record request by utilizing the target filter, and generating a corresponding hash fingerprint;
mapping the hash fingerprint to k bit positions of the target filter bit array, and judging whether a counting node exists in each bit position; if the bit position has a counting node, continuously judging whether the counting value of the counting node is more than or equal to 2;
If the count value of the counting node is equal to 2, deleting the counting node in a dynamic counting chain table; if the count value of the count node is greater than 2, the count value is reduced by 1;
and searching and deleting the document processing record in the checked database table.
Further, the determining whether the count node exists for each bit further includes:
if the bit does not have a counting node, the value of the corresponding k bit positions in the target filter bit array is set to 0.
Further, the electronic document processing method includes:
and initializing the target filter after the processing of all the current electronic documents is finished within a specified time period.
According to a second aspect of the present invention, there is provided an electronic document processing system comprising:
the filter construction module is used for constructing an initial filter, and the initial filter is constructed based on a standard filter and a dynamic counting linked list; the standard filter is stored by adopting a bit array structure, and parameters of the standard filter meet the following formula:
wherein k represents the number of hash functions, p represents the false alarm rate of a standard filter, m represents the bit array length, and n represents the data size of the current processing platform;
The dynamic counting linked list satisfies the following conditions: when the number x of the inserted elements at a certain bit position in the standard filter bit array is more than or equal to 2, keeping the current bit value as 1, and adding a counter node < i, x > in the current dynamic counting chain table for identifying that the ith position of the bit array has inserted x data;
the data insertion module is used for inserting the currently stored checked and re-documented mapping into the initial filter to obtain a target filter;
and the data duplicate checking module is used for responding to a receipt duplicate checking request sent by a user and checking and duplicate-checking the to-be-processed receipt carried in the receipt duplicate checking request by utilizing the target filter.
By the technical scheme of the invention, the following technical effects can be obtained:
(1) According to the electronic bill processing method and system, the target filter is constructed to realize quick weight judgment of the electronic bill, so that the number of times that the database needs to be searched for the existing data weight judgment can be greatly reduced, and the efficiency of inquiring and weight judgment of the electronic bill is remarkably improved;
(2) According to the electronic bill processing method and system provided by the invention, on the basis of the standard filter, the dynamic counting linked list is newly added to record the mapped times of the standard filter bit array according to the sparse vector characteristics of the processing record, so that the requirement of deleting elements in the standard filter when the processing record is cancelled can be met, and meanwhile, the overhead increase occupied by the standard filter space can be effectively controlled.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for processing an electronic document according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for performing repeat checking processing on a document to be processed carried in the document repeat checking request by using the target filter according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for deleting the document processing record in the target filter and the checked database table according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an electronic document processing system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," "third," and "fourth," etc. in the description and claims of the present application are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprising" and "having" and any variations thereof, in embodiments of the present application, are intended to cover non-exclusive inclusions.
Alternative embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
According to an embodiment of the present invention, fig. 1 is a flowchart of a method for processing electronic documents, where the method is applicable to a service scenario of any electronic document, for example, a service scenario of electronic document reimbursement and duplicate service scenario, a service scenario of travel assistance and daily reimbursement and duplicate service scenario of electronic ticket reimbursement and duplicate service scenario. When judging whether the electronic bill is repeatedly processed, the electronic bill processing method does not need to completely read the content of the electronic bill, but only needs to judge whether the electronic bill is in the processed electronic bill set, so that the judgment process of the electronic bill repeatedly processing can be modeled as the judgment problem of a large sample set, and a filter method with much higher efficiency than database duplicate checking can be adopted for judgment.
Specifically, as shown in fig. 1, an embodiment of the present invention provides a method for processing an electronic document, where the method can be applied to any electronic device of a terminal, such as a computer, a tablet computer, a mobile phone, etc. The electronic bill processing method comprises the following steps:
s10, constructing an initial filter, wherein the initial filter is constructed based on a standard filter and a dynamic counting linked list; the standard filter is stored by adopting a bit array structure, and the dynamic counting linked list is stored by adopting a linked list so as to be capable of inserting bit positions of a plurality of elements and the number of the elements inserted into the bit array of the labeling filter;
wherein the initial filter may be considered an improvement over existing bloom filters. Specifically, the bloom filter is a data structure and is used for rapidly judging whether an element belongs to a set, so that large-scale data can be efficiently processed, and the bloom filter has the characteristics of high space efficiency, high query speed and the like. The working principle of the bloom filter comprises: k random hash functions and 1 long bit array, wherein each bit position value of the bit array is 0, when an element is inserted, the k hash functions are used for calculation, and the corresponding k bit positions in the bit array, which are also called hash indexes, are set to be 1 according to the calculation result. The same applies to data inquiry, after data is correspondingly calculated, whether k bit positions corresponding to inquiry are all 1 or not, if the k bit positions are all 1, the data are already existing, otherwise, the data are not already existing.
Since the hash function computation has the property of uniformly mapping from a large sample space to a small sample space, there must be some "collision conflict", so if the result of detecting an element is "present", the bloom filter does not have to be in the collection (there is a possibility of collision with another element); but if the detection result is "absent", the element must not be in the collection. This feature of bloom filters is not necessarily friendly to business scenarios that require accurate lookup, but is very suitable for scenarios that determine whether electronic documents are repeatedly processed. In the actual business scene, most of the submitted electronic documents are unprocessed, the business statistics exceeds 99%, and the processing can be continued as long as one electronic document is determined to be unprocessed, so that the bloom filter returns the conclusion that the electronic document is unprocessed under a high probability and is 100% credible, and the result that the electronic document can be processed continuously can be obtained quickly. It can be appreciated that based on the features of the bloom filter, the embodiment of the invention is feasible to utilize the filter to perform electronic single data review processing.
However, the standard bloom filter cannot be directly applied to the business of the electronic document processing and the judgment, because each bit of the bloom filter bit array only supports two states of "0" (representing none) and "1" (representing none), how many elements occupy the bit as an index, namely only the new addition of the element is supported, and the deletion of the element is not supported. However, in the business scenario of electronic document processing, there may be a situation that a sponsor returns or self-withdraws after submitting and reimburseing an electronic document, in this case, the reimbursement state of the electronic document needs to be withdrawn, that is, an element is to be deleted from the bloom filter, and the standard bloom filter cannot support the deletion action.
In the existing solution supporting bloom filter deleting elements, the most common idea is to change the bit array of the filter into a counter with multiple bits, for example, to change the bit array of the filter into an integer array occupying two bytes, so that each index position of the bloom filter can support 2 16 -1 = 65535 counts. However, the effect of this modification on the space occupation of the bloom filter is also significant, and the space of the filter will be increased by a factor of 16. Under the scene of electronic bill reimbursement and repeated investigation, the element occupancy rate of the bloom filter is controlled to be 5 by adjusting the parameters such as the number k, the number scale n, the array length m, the false alarm rate p and the like of the hash function of the bloom filter Below 0%, that is, all positions are "0" when the bloom filter is initialized, in the process of continuously inserting electronic document elements, the positions with the value of "0" are gradually set to "1" or counted, but more than 50% of positions are still "0", and for the sparse vector, each position uses a counter with the same space (such as 2 bytes), so that the space is greatly wasted. Therefore, in step S10 of the embodiment of the present invention, an initial filter is first constructed, and the initial filter is constructed based on a bloom filter combined with a dynamic counting chain table, so that space occupation can be reduced as much as possible while supporting position counting of the filter.
Specifically, in step S10, the initial filter construction includes two parts, namely, a standard filter construction and a dynamic counting linked list construction. In the process of constructing the standard filter, the parameters of the standard filter meet the following formula:
; (1)
; (2)
wherein k represents the number of hash functions and is used for calculating one data into different values through different hash functions; m represents the bit array length, the value calculated by the hash function requires modulo operation to fall into the bit array, and the modulo value is the length m of the bit array. n represents the processing data size of the current processing platform. p represents the false alarm rate of the standard filter, the bit number of the value calculated by the hash is limited, when the data quantity is large, the problem that a plurality of values repeatedly collide is necessarily existed, and then the false alarm rate of 'present not but present' is p, and here, it can be seen that the filter is not suitable for detecting that the data is necessarily present, but suitable for detecting that the data is not necessarily present. The standard filter false alarm rate p cannot be too high, and of course the bit array length m cannot be too large.
From the above formulas (1) and (2), it can be seen that the number k of hash functions is related only to the filter false alarm rate p, and the bit array length m is related to the number k of hash functions and the size n of the cancellation data. Since k can only be an integer, the relationship among the parameters k, m, n, and P can satisfy the following table one:
list one
k p k/ln2
1 0.5 1.442695041
2 0.25 2.885390082
3 0.125 4.328085123
4 0.0625 5.770780164
5 0.03125 7.213475204
6 0.015625 8.656170245
7 0.0078125 10.09886529
8 0.00390625 11.54156033
9 0.001953125 12.98425537
10 0.000976563 14.42695041
From the above table one, it can be seen that when k takes a value of 4, p can be reduced to about 6.25%, and at this time, the multiple between m and n is about 5.7, i.e. the size of 3600 ten thousand electronic documents is about 20520 ten thousand bits for storage, and about 25MB of memory space is required.
In the process of constructing the dynamic counting linked list, the dynamic counting linked list meets the following conditions: when the number x of the inserted elements at a certain bit position in the standard filter bit array is greater than or equal to 2, the current bit value is kept to be 1, and a counting node < i, x > is added in the current dynamic counting linked list and used for identifying that the ith position of the bit array has inserted x data.
Specifically, to record the number of inserted elements at a certain position in the standard filter bit array, a linked list is used for storage. That is, when an electronic document is mapped to the ith position of the bit array after hashing, the original value "0" of this position (bitArray [ i ]) is set to "1", and if this position is inserted into the 2 nd element, the position remains unchanged by the value "1", but a sequential node < i,2> is added to the dynamic count linked list, the ith position of the identification bit array has been inserted into 2 data, and if more elements are hashed to the ith position, the node in the dynamic count linked list that identifies i is incremented by < i, x > to identify that the ith position has been mapped with x data.
As another alternative, the dynamic count linked list also supports deleting inserted elements in the bit array. Specifically, when an electronic bill is canceled for reimbursement or returned, deleting the element from the standard filter, and similarly mapping the electronic bill to the ith position of the bit array after hashing, judging whether < i, x > exists in the dynamic counting chain table, and judging whether x is greater than or equal to 2 if the < i, x > exists; when x is greater than 2, subtracting 1 from x; when x is equal to 2, deleting the counting node from the dynamic counting linked list; if the dynamic counting chain table does not have the counting nodes < i, x >, the position i of the bit array is mapped by only one electronic bill, and the value of the position of the bit array is directly changed from 1 to 0.
Because the bit array is set to be '1' and has the characteristic of sparse vectors, and the counting linked list is dynamically added and deleted, compared with the method that the bit array of the bloom filter is directly changed into an integer array to be used as a counter, the initial filter constructed by the embodiment of the invention fully utilizes the characteristics that a large amount of the bit array of the standard filter is '0', a part of the bit array is '1' and a small amount of the bit array is 'n', and 1 node record < i, n > is added in the linked list instead of each bit position for the position where more than 1 element is inserted, so that the increase of the storage space can be reduced while the number of the inserted elements is recorded.
S20, inserting the currently stored checked heavy bill map into the initial filter to obtain a target filter;
based on the initial filter constructed in step S10, this step S20 needs to perform mapping insertion of stock data on the initial filter according to a specific scenario, that is, hash fingerprints mapped and inserted with sample data in the target filter. It will be appreciated that this step is for creating a sample data set for a duplicate determination for subsequent duplicate checking based on the sample data set.
Example one
As a specific example, for a system that uses a database system to store electronic certificate reimbursement records in the electronic certificate reimbursement service scenario, the stored reimbursement electronic certificates are generally stored in a relational database table in the form of fields of unique identification, amount, reimbursement status, associated reimbursement single number, sponsor, etc., that is, the reimbursement database table, before reimbursement judgment is performed by using the target filter, standard filters are respectively constructed according to the type of the electronic certificates, and the reimbursement electronic certificates in the current accounting year are mapped into the initial filter. Specifically, the step S20 includes:
step a, selecting a corresponding number of initial filters according to the number of types of the checked heavy documents;
The electronic certificates which are generally used for reimbursement mainly comprise electronic invoice, railway electronic ticket, aviation ticket electronic travel ticket, financial electronic ticket, customs special electronic payment book, bank electronic receipt and the like, each type of electronic certificate is automatically judged by a system or specified as a specific type by a reimbursement manager before reimbursement is submitted, so that a filter is constructed for each type of electronic certificate, and the possible hash collision probability when the electronic certificates are mapped to the filters can be reduced, namely the number of standard filters is consistent with the number of the types of the electronic certificates. It should be noted that, specific parameters of the standard filter in this step are determined according to the size of each type of electronic certificate, and the size of the electronic certificates reimbursed by the reimbursement service cloud platform in a year reaches 3600 ten thousand, and the reimbursement manager submits a new electronic bill for reimbursement, if the newly submitted electronic bill is not in the past reimbursement record, reimbursement is continued, if the reimbursement record exists, the reimbursement is processed separately, and at this time, n is 3600 ten thousand.
And b, inserting the checked re-bill mapping of the same type into the same standard filter based on the dynamic counting linked list to obtain corresponding target filters, wherein each target filter represents a data set of one type.
In order to map the electronic certificate to the initial filter as uniformly as possible and reflect key information of the electronic certificate as much as possible, the key information of the electronic certificate, such as a unique identifier and a reimbursement amount, is subjected to a hash operation to obtain a hash fingerprint and stored in a reimbursement electronic certificate database table, namely, a checked database table, as a field attribute. All the stock-submitted reimbursed electronic certificates are submitted to corresponding initial filters for mapping according to the category, and the mapping insertion process is as described in step S10, so that the standard filters and the counting linked list of the stock-reimbursed electronic certificates, namely the target filters, are obtained. That is, this step may result in multiple target filters, each of which may be used for duplicate checking of the same type of electronic certificate.
Example two
As another specific example, for a system for storing a travel reimbursement record by using a database system in a travel reimbursement judgment service scenario, the reimbursement record of the reimbursement in stock is generally stored in a relational database table, that is, the checked-up database table, in a manner of fields such as reimbursement list number, business trip person, business trip departure date, business trip return date, reimbursement amount, etc., and the target filter is generated before the travel reimbursement judgment is performed by using the target filter, specifically generating the target filter includes:
And c, acquiring the initial filter. The specific parameters of the initial filter are determined according to the scale number of the business trip records, for example, the number of the business trip singular number gauge models of the business trip service cloud platform for annual business trip is 1000 ten thousand, each business trip number of 2 people is on average, the business trip auxiliary counting records of the corresponding business trip people in the natural days can reach about 6000 ten thousand, the business trip manager submits a new business trip list, if the business trip people in the newly submitted business trip list do not have the record of the business trip auxiliary counting in the business trip date, the business trip is continuously reimbursed, if the business trip auxiliary counting records are additionally processed, and n is 6000 ten thousand.
And d, based on the dynamic counting linked list, inserting the stock reimbursed business trip record map into the standard filter to obtain a target filter. In order to map the business trip record of the business trip person on the natural Day to the standard filter as uniformly as possible, the key information of the business trip person, such as a unique identification ID and a business trip date Day (natural Day), is firstly subjected to a hash operation to obtain a hash fingerprint H of the business trip record:
H=hash(ID+Day) (3);
submitting the fingerprint hashes of all the stock reimbursed business trip records corresponding to the business trip dates to the standard filter for mapping, wherein the mapping insertion process is as shown in the step S10, and the standard filter and the counting chain table of the stock reimbursed business trip auxiliary counting record, namely the target filter, are obtained.
S30, responding to a receipt duplication checking request sent by a user, and performing duplication checking processing on the to-be-processed receipt carried in the receipt duplication checking request by utilizing the target filter.
After the sample set is established in step S20, the newly submitted electronic document may be subjected to the review process in step S30. Specifically, as shown in fig. 2, the step S30 includes:
s31, carrying out hash operation on the document to be processed by utilizing the target filter, and generating a corresponding hash fingerprint;
in the step, key information of the document to be processed is extracted first, and then hash operation is carried out on the key information to obtain a hash fingerprint. Specifically, in embodiment 1 of the above specific example, for an electronic certificate newly submitted for reimbursement, first, a unique identifier and reimbursement amount of the electronic certificate are extracted, a hash fingerprint is calculated according to the unique identifier and reimbursement amount of the electronic certificate, and then the calculated hash fingerprint is submitted to a target filter for reimbursement. It should be noted that, for the embodiment 1 of the specific example, the target filter in step S31 needs to be selected according to the type of the document to be processed, so as to be capable of performing the check and repeat processing for the type of the document to be processed.
In embodiment 2 of the above specific example, for a newly submitted travel reimbursement sheet, first, a combination of a business person and a business date (natural day) in the travel reimbursement sheet is extracted, a fingerprint hash of a combination of a business person unique identifier and a business date is calculated, and the fingerprint hash is submitted to the target filter for reimbursement.
S32, judging whether the values of the k bit positions in the target filter bit array mapped by the hash fingerprint are all 1;
s33, if the values of k bit positions in the bit array are not all 1, indicating that the document to be processed is not repeatedly processed, continuing to execute the document processing flow; or if the values of the k bit positions of the bit number group are all 1, which means that the document to be processed is possibly processed repeatedly, continuing to perform secondary check on the document to be processed.
Further, the document to be processed is not repeatedly processed, and the document processing flow is continuously executed, including:
setting the value of the bit position with the current value of 0 in the k bit positions to be 1; meanwhile, if the corresponding counting node exists in the dynamic counting linked list at the bit position with the current value of 1 in the k bit positions, the current counting value of the counting node is increased by 1; if the dynamic counting linked list does not have the corresponding counting node, inserting the corresponding counting node and setting the counting value to be 2; and storing the document to be processed and the corresponding hash fingerprint into a checked database table.
The checked and re-processed database table is used for recording the electronic bill related information which is not repeatedly processed when the checked and re-processed electronic bill is processed each time, so that processes such as reimbursement or verification and the like can be further carried out. It should be noted that, the currently stored checked heavy bill inserted into the target filter in step S20 includes the data in the checked heavy database table.
Further, because the conclusion that the bloom filter returns that the electronic certificate has been reimbursed is not completely trusted under a small probability, in an actual service scene, the electronic document which is possibly repeatedly processed can be further subjected to the duplicate checking process. Specifically, the continuing to perform the second-level check on the document to be processed includes:
step e, searching the document to be processed based on the checked database table;
f, if the document to be processed is retrieved from the checked database table, the document to be processed is represented as repeated processing, and the processed information of the document to be processed is returned to the user; or if the document to be processed is not retrieved in the checked database table, the document to be processed is not repeatedly processed, and the document processing flow is continuously executed at the moment.
Specifically, if a duplicate electronic document is retrieved from the checked database table, it is indicated that there is a case of duplicate processing, and the complete information of the electronic document, such as the unique document identifier, the amount of money, the associated reimbursement bill number, the sponsor, the reimbursement time, etc., is returned to the front-end user, such as the sponsor newly submitting reimbursement bill, and the front-end user can again manually determine whether the reimbursement bill needs to be modified. If the duplicate electronic document is not retrieved from the processed database table, it is indicated that the target filter has misjudgment, that is, the document to be processed is not processed, if the document to be processed is not reimbursed, the document processing flow is continuously executed at this time, and details are not repeated here. And e and f make up for the small probability misjudgment of the target filter through the secondary database duplicate checking process, but the small probability misjudgment does not influence the accuracy of the electronic bill processing duplicate checking service.
As an optional embodiment, the electronic document processing method further includes:
s40, deleting the bill processing record in the target filter and the checked database table according to the bill processing record revocation request sent by the user.
FIG. 3 shows a flowchart of a method for deleting the document processing record in the target filter and the checked database table, and as shown in FIG. 3, step S40 further includes:
s41, carrying out hash operation on document data carried in the document revocation processing record request by utilizing the target filter, and generating a corresponding hash fingerprint;
s42, mapping the hash fingerprint to k bit positions of the target filter bit array, and judging whether a counting node exists in each bit position; if the bit position has a counting node, continuously judging whether the counting value of the counting node is more than or equal to 2;
s43, deleting the counting node in a dynamic counting chain table if the counting value of the counting node is equal to 2; if the count value of the count node is greater than 2, the count value is reduced by 1;
s44, searching and deleting the document processing record in the checked database table.
As an optional embodiment, the determining whether the count node exists for each bit further includes: if the bit does not have a counting node, the value of the corresponding k bit positions in the target filter bit array is set to 0.
For example, in the actual application process, when an electronic certificate is canceled for reimbursement or returned, the electronic certificate needs to be deleted from the target filter and the reimbursed electronic certificate database table. Firstly, mapping the hash fingerprint of the electronic certificate to k positions in a target filter, judging whether counting nodes < i, x > of the k positions exist in a counting chain table, if so, judging whether x is greater than or equal to 2, and subtracting 1 from the value of x when the value of x is greater than 2; when the value of x is equal to 2, deleting the counting node from a counting linked list; if the counting nodes < i, x > do not exist in the counting chain table, the position i of the bit array is mapped by the electronic certificate, the value of the position i of the bit array is changed from 1 to 0, and after the mapping information of the electronic certificate is deleted from the target filter, the electronic certificate is continuously deleted in the database table of the electronic certificate which is reimbursed.
In some optional embodiments, the electronic document processing method further includes:
s50, initializing the target filter after the processing of all the current electronic documents is finished within a specified time period.
The specified time period can be a quarter, a year and the like, and is specifically set according to the actual needs of the user. Generally, few units reimburse electronic vouchers or business trips across years, so after an accounting year is checked out, the target filter can be reinitialized to save memory space. Specifically, the initializing the target filter includes: firstly, setting the values of all positions of the target filter bit array to be 0, and secondly, deleting all counting nodes in the dynamic counting linked list.
When there is a newly submitted electronic document belonging to the next year, the processing is again performed according to the above steps S10 to S50. When newly submitted electronic documents belonging to the historical years, the hash fingerprints of the electronic documents are directly searched in the historical checked database table, and because few electronic documents submitted and reimbursed across the years are few, the overall efficiency of the system is not greatly influenced by the database table checking mode.
According to the electronic bill processing method provided by the embodiment of the invention, the rapid judgment of the electronic bill is realized by constructing the target filter, so that the number of times that the database needs to be searched for the judgment of the existing electronic bill can be greatly reduced, and the efficiency of inquiring and judging the electronic bill is obviously improved; in addition, on the basis of the standard filter, according to the sparse vector characteristics of the electronic bill processing records, a counting linked list is newly added to record the mapped times of the standard filter bit array, so that the requirement of deleting standard filter elements when the electronic bill processing records are cancelled can be met, and meanwhile, the overhead increase occupied by the target filter space can be effectively controlled.
Fig. 4 is a schematic system structure diagram corresponding to the electronic document processing method according to the above embodiment of the present invention, and as shown in fig. 4, the present invention provides an electronic document processing system 400, including:
A filter construction module 410 for constructing an initial filter based on standard filter construction in combination with dynamic count linked list; the standard filter is stored by adopting a bit array structure, and the dynamic counting linked list stores the number of inserted elements at a bit position in the standard filter bit array by adopting a linked list;
the data insertion module 420 is configured to insert the currently stored mapping of the checked and reloaded document into the initial filter to obtain a target filter;
and the data duplication checking module 430 is configured to respond to a document duplication checking request sent by a user, and perform duplication checking processing on a document to be processed carried in the document duplication checking request by using the target filter.
Further, the filter construction module 410 is further configured to construct a standard filter, where parameters of the standard filter satisfy the following formula:
k=-(lnp)/(ln2);
m=(kn)/(ln2);
wherein k represents the number of hash functions, p represents the false alarm rate of a standard filter, m represents the bit array length, and n represents the data size of the current processing platform;
the filter construction module 410 is further configured to construct a dynamic count linked list, where the dynamic count linked list satisfies: when the number x of the inserted elements at a certain bit position in the standard filter bit array is greater than or equal to 2, the current bit value is kept to be 1, and a counter node < i, x > is added in the current dynamic counting linked list and is used for identifying that the ith position of the bit array has inserted x data.
Further, the data insertion module 420 is further configured to select a corresponding number of the initial filters according to the number of types of the currently stored checked heavy documents; and inserting the checked and re-documented mapping of the same type into the same standard filter based on the dynamic counting linked list to obtain corresponding target filters, wherein each target filter represents a documentary set of one type.
Further, the data duplication checking module 430 includes a hash operation unit 431, a first duplication checking unit 432 and a second duplication checking unit 433.
Wherein, the hash operation unit 431 is used for: and carrying out hash operation on the document to be processed by using the target filter to generate a corresponding hash fingerprint.
The first duplicate checking unit 432 is configured to: judging whether the values of k bit positions in the target filter bit array mapped by the hash fingerprint are all 1 or not; if the values of k bit positions in the bit array are not all 1, indicating that the document to be processed is not repeatedly processed, continuing to execute the document processing flow; or if the values of the k bit positions of the bit number group are all 1, which means that the document to be processed is possibly processed repeatedly, continuing to perform secondary check on the document to be processed.
The first duplicate checking unit 432 is further configured to: setting the value of the bit position with the current value of 0 in the k bit positions to be 1; meanwhile, in the k bit positions, if the corresponding counting node exists in the dynamic counting linked list, the current counting value of the counting node is increased by 1; if the dynamic counting linked list does not have the corresponding counting node, inserting the corresponding counting node and setting the counting value to be 2; and storing the document to be processed and the corresponding hash fingerprint into a checked database table.
The second duplicate checking unit 433 is configured to: searching the document to be processed based on the checked database table; if the document to be processed is retrieved from the checked duplicate database table, the document to be processed is indicated to be repeatedly processed, and the processed information of the document to be processed is returned to the user at the moment; or if the document to be processed is not retrieved in the checked database table, the document to be processed is not repeatedly processed, and the document processing flow in the embodiment is continuously executed.
Further, the electronic document processing system further includes a data deleting module 440, where the data deleting module 440 is configured to delete the document processing record in the target filter and the checked database table according to a document processing record revocation request sent by the user.
Specifically, the data deleting module 440 is further configured to: carrying out hash operation on bill data carried in the withdrawn bill processing record request by utilizing the target filter to generate corresponding hash fingerprints, wherein in the embodiment, a domestic SM3 hash algorithm is adopted for carrying out hash operation; mapping the hash fingerprint to k bit positions of the target filter bit array, and judging whether a counting node exists in each bit position; if the bit position has a counting node, continuously judging whether the counting value of the counting node is more than or equal to 2; if the count value of the counting node is equal to 2, deleting the counting node in a dynamic counting chain table; if the count value of the count node is greater than 2, the count value is reduced by 1; and searching and deleting the document processing record in the checked database table.
Further, the data deleting module 440 is further configured to: if the bit does not have a counting node, the value of the corresponding k bit positions in the target filter bit array is set to 0.
Further, the electronic document processing system 400 further includes an initializing module 450, where the initializing module 450 is configured to initialize the target filter after all electronic documents are processed currently within a specified period of time.
The electronic bill processing system provided by the invention realizes the rapid weight judgment of the electronic bill by constructing the target filter, can greatly reduce the times that the database needs to be searched for the weight judgment of the existing electronic bill, and remarkably improves the efficiency of inquiring and judging the weight of the electronic bill; in addition, on the basis of the standard filter, according to the sparse vector characteristics of the electronic bill processing records, a counting linked list is newly added to record the mapped times of the standard filter bit array, so that the requirement of deleting standard filter elements when the electronic bill processing records are cancelled can be met, and meanwhile, the overhead increase occupied by the target filter space can be effectively controlled.
Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and device described above may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative.
It will be appreciated by persons skilled in the art that the scope of the invention referred to in the present invention is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept. Such as the above-mentioned features and the technical features disclosed in the present invention (but not limited to) having similar functions are replaced with each other.
It should be understood that, the sequence numbers of the steps in the summary and the embodiments of the present invention do not necessarily mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments of the present invention.

Claims (10)

1. An electronic document processing method, comprising:
constructing an initial filter, wherein the initial filter is constructed based on a standard filter and a dynamic counting linked list; the standard filter is stored by adopting a bit array structure, and parameters of the standard filter meet the following formula:
wherein k represents the number of hash functions, p represents the false alarm rate of a standard filter, m represents the bit array length, and n represents the data size of the current processing platform;
The dynamic counting linked list satisfies the following conditions: when the number x of the inserted elements at a certain bit position in the standard filter bit array is more than or equal to 2, keeping the current bit value as 1, and adding a counter node < i, x > in the current dynamic counting chain table for identifying that the ith position of the bit array has inserted x data;
inserting the checked heavy bill map stored by the current processing platform into the initial filter to obtain a target filter;
responding to a receipt duplication checking request sent by a user, and checking and duplication of the to-be-processed receipt carried in the receipt duplication checking request by utilizing the target filter.
2. The electronic document processing method according to claim 1, wherein the step of inserting the scanned document map stored in the current processing platform into the initial filter to obtain a target filter;
selecting a corresponding number of initial filters according to the number of types of the checked heavy documents stored by the current processing platform;
and inserting the checked and re-documented mapping of the same type into the same standard filter based on the dynamic counting linked list to obtain corresponding target filters, wherein each target filter represents a documentary set of one type.
3. The electronic document processing method according to claim 1, wherein the responding to the document duplication checking request sent by the user, and performing duplication checking processing on the document to be processed carried in the document duplication checking request by using the target filter, includes:
carrying out hash operation on the document to be processed by using the target filter to generate a corresponding hash fingerprint;
judging whether the values of k bit positions in the target filter bit array mapped by the hash fingerprint are all 1 or not;
and if the values of the k bit positions in the bit array are not 1, indicating that the document to be processed is not repeatedly processed, continuing to execute the document processing flow.
4. A method of electronic document processing according to claim 3 wherein the document to be processed is not repeatedly processed and continues to execute document processing procedures, comprising:
setting the value of the bit position with the current value of 0 in the k bit positions to be 1; simultaneously, the bit position with the current value of 1 in the k bit positions is increased by 1 if the corresponding counting node exists in the dynamic counting linked list; if the dynamic counting linked list does not have the corresponding counting node, inserting the corresponding counting node and setting the counting value to be 2; the method comprises the steps of,
And storing the bill to be processed and the corresponding hash fingerprint into a checked database table.
5. The electronic document processing method according to claim 3, wherein the determining whether the values of k bit positions in the hash fingerprint mapped to the target filter bit array are all 1 further comprises:
and if the values of the k bit positions of the bit array are all 1, which indicates that the document to be processed is possibly processed repeatedly, continuing to perform secondary check on the document to be processed.
6. The electronic document processing method according to claim 5, wherein the continuing the secondary check of the document to be processed includes:
searching the document to be processed based on the checked database table;
if the document to be processed is retrieved from the checked duplicate database table, the document to be processed is indicated to be repeatedly processed, and the processed information of the document to be processed is returned to the user at the moment; or if the document to be processed is not retrieved in the checked database table, the document to be processed is not repeatedly processed, and the document processing flow is continuously executed at the moment.
7. The electronic document processing method according to any one of claims 1 to 6, characterized in that the electronic document processing method further comprises:
Deleting the bill processing record in the target filter and the checked database table according to the bill processing record revocation request sent by the user;
the deleting the document processing record in the target filter and the checked database table comprises the following steps:
carrying out hash operation on bill data carried in the withdrawn bill processing record request by utilizing the target filter, and generating a corresponding hash fingerprint;
mapping the hash fingerprint to k bit positions of the target filter bit array, and judging whether a counting node exists in each bit position; if the bit position has a counting node, continuously judging whether the counting value of the counting node is more than or equal to 2;
if the count value of the counting node is equal to 2, deleting the counting node in a dynamic counting chain table; if the count value of the count node is greater than 2, the count value is reduced by 1;
and searching and deleting the document processing record in the checked database table.
8. The electronic document processing method of claim 7, wherein the determining whether the count node exists for each bit further comprises:
if the bit does not have a counting node, the value of the corresponding k bit positions in the target filter bit array is set to 0.
9. The electronic document processing method according to claim 1, further comprising:
and initializing the target filter after the processing of all the current electronic documents is finished within a specified time period.
10. An electronic document processing system, comprising:
the filter construction module is used for constructing an initial filter, and the initial filter is constructed based on a standard filter and a dynamic counting linked list; the standard filter is stored by adopting a bit array structure, and parameters of the standard filter meet the following formula:
wherein k represents the number of hash functions, p represents the false alarm rate of a standard filter, m represents the bit array length, and n represents the data size of the current processing platform;
the dynamic counting linked list satisfies the following conditions: when the number x of the inserted elements at a certain bit position in the standard filter bit array is more than or equal to 2, keeping the current bit value as 1, and adding a counter node < i, x > in the current dynamic counting chain table for identifying that the ith position of the bit array has inserted x data;
the data insertion module is used for inserting the checked and re-documented mapping stored by the current processing platform into the initial filter to obtain a target filter;
And the data duplicate checking module is used for responding to a receipt duplicate checking request sent by a user and checking and duplicate-checking the to-be-processed receipt carried in the receipt duplicate checking request by utilizing the target filter.
CN202311638698.5A 2023-12-04 2023-12-04 Electronic bill processing method and system Pending CN117370624A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311638698.5A CN117370624A (en) 2023-12-04 2023-12-04 Electronic bill processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311638698.5A CN117370624A (en) 2023-12-04 2023-12-04 Electronic bill processing method and system

Publications (1)

Publication Number Publication Date
CN117370624A true CN117370624A (en) 2024-01-09

Family

ID=89406217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311638698.5A Pending CN117370624A (en) 2023-12-04 2023-12-04 Electronic bill processing method and system

Country Status (1)

Country Link
CN (1) CN117370624A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970744A (en) * 2013-01-25 2014-08-06 华中科技大学 Extendible repeated data detection method
CN108241709A (en) * 2016-12-27 2018-07-03 ***通信有限公司研究院 A kind of data integrating method, device and system
CN111930924A (en) * 2020-07-02 2020-11-13 上海微亿智造科技有限公司 Data duplicate checking system and method based on bloom filter
CN112068958A (en) * 2020-08-31 2020-12-11 常州微亿智造科技有限公司 Bloom filter and data processing method
WO2021066257A1 (en) * 2019-10-01 2021-04-08 인하대학교 산학협력단 Efficient ransomware detection method and system using bloom-filter

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970744A (en) * 2013-01-25 2014-08-06 华中科技大学 Extendible repeated data detection method
CN108241709A (en) * 2016-12-27 2018-07-03 ***通信有限公司研究院 A kind of data integrating method, device and system
WO2021066257A1 (en) * 2019-10-01 2021-04-08 인하대학교 산학협력단 Efficient ransomware detection method and system using bloom-filter
CN111930924A (en) * 2020-07-02 2020-11-13 上海微亿智造科技有限公司 Data duplicate checking system and method based on bloom filter
CN112068958A (en) * 2020-08-31 2020-12-11 常州微亿智造科技有限公司 Bloom filter and data processing method

Similar Documents

Publication Publication Date Title
US8862566B2 (en) Systems and methods for intelligent parallel searching
WO2009010950A1 (en) System and method for predicting a measure of anomalousness and similarity of records in relation to a set of reference records
CN110827028A (en) Data acquisition and transaction system and method based on block chain
US7054833B1 (en) Method and system for processing unclaimed property information
US20150199784A1 (en) Systems and Methods For Estimating Probability Of Identity-Based Fraud
CN110728301A (en) Credit scoring method, device, terminal and storage medium for individual user
CN114140221A (en) Fraud risk early warning method, device and equipment
CN110457332B (en) Information processing method and related equipment
CN117370624A (en) Electronic bill processing method and system
CN111126966A (en) Bill auditing method and device, computer equipment and computer-readable storage medium
CN115982205A (en) Intelligent collection system and collection method for massive multi-metadata
CN114691791A (en) Dynamic information correlation method
CN112632115A (en) BI-based data query method and system
TWM580230U (en) Financial service application review system
CN111339217B (en) Data processing method and device
CN110796471A (en) Information processing method and device
CN113495982B (en) Transaction node management method and device, computer equipment and storage medium
CN115292297B (en) Method and system for constructing data quality monitoring rule of data warehouse
TWI769365B (en) Financial service application screening method and system
CN113723522B (en) Abnormal user identification method and device, electronic equipment and storage medium
CN114119137B (en) Risk control method and apparatus
TWI802490B (en) Multi-applicant financial service application screening method and system
CN118210820A (en) Information query prefiltering method, device, computer equipment and storage medium
CN114064726A (en) Address retrieval method, device and equipment for block chain HD (high-definition) wallet address monitoring
TWI798149B (en) Financial service application screening method performing financial service based on applicant credit and system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination