CN116912016A - Bill auditing method and device - Google Patents

Bill auditing method and device Download PDF

Info

Publication number
CN116912016A
CN116912016A CN202211679469.3A CN202211679469A CN116912016A CN 116912016 A CN116912016 A CN 116912016A CN 202211679469 A CN202211679469 A CN 202211679469A CN 116912016 A CN116912016 A CN 116912016A
Authority
CN
China
Prior art keywords
subject
data
model
target item
bill
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211679469.3A
Other languages
Chinese (zh)
Inventor
徐磊
邓超
蒋强
封翼
马晓明
王坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Jiangsu Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Jiangsu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Jiangsu Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202211679469.3A priority Critical patent/CN116912016A/en
Publication of CN116912016A publication Critical patent/CN116912016A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Medical Informatics (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a bill auditing method and a bill auditing device, which are characterized in that historical subject data corresponding to target items in bills to be audited are obtained; inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model; obtaining an auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item; the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data. According to the invention, the specific subjects of the target item in the bill to be checked are predicted through the weighted fusion model, and the bill is checked according to the difference between the subject prediction data and the subject actual data, so that manual participation in the check is not needed, the manpower expense is reduced, the weighted fusion model is used for prediction, and the accuracy of the check result is improved, so that the existing bill checking efficiency is improved.

Description

Bill auditing method and device
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a bill auditing method and device.
Background
Accounting audit refers to the process of auditing accounting documents, account books, reports and other accounting data, and is a key link of efficient operation of modern enterprises. As the enterprise scale continues to expand, the business types involved by the enterprise are more and more, the rules are more and more complex, and the processing logic of the accounting system of the enterprise is also very complex, for example, the form and the procedure are complex due to the handover of multiple departments and multiple business links, and the risks of running, risking, dripping and leaking are existed.
The current commonly adopted financial bill auditing method is to predict a target business through some simple statistical models based on enterprise historical bill data to obtain an original predicted bill of the target business, then to make certain adjustment to the original predicted bill through the judgment of operators based on own experience to obtain a predicted bill, and to judge whether the actual bill meets the specification or meets the expectations according to the difference between the predicted bill and the actual bill generated by the target business.
However, the above auditing method mainly depends on some simple statistical models, is greatly affected by artificial subjective judgment, and has different experiences, so that the final prediction result is not accurate enough.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a bill auditing method and device which are used for improving the accuracy of business bill auditing.
The invention provides a bill auditing method, which comprises the following steps:
acquiring historical subject data corresponding to target items in a bill to be checked;
inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model;
obtaining an auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item;
the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data.
According to the bill auditing method provided by the invention, the weighted fusion model is obtained by adopting the following steps:
respectively training the first sub-model and the second sub-model by using a training data set to obtain a trained first sub-model and a trained second sub-model; wherein the training data set comprises subject history sample data of the target item and the sample tags corresponding to each of the subject history sample data;
Constructing a fusion model according to the first trained sub-model and the second trained sub-model by the preset weight ratio;
training the fusion model by utilizing the training data set to adjust the weight ratio of the first sub-model and the second sub-model so as to obtain an optimal weight ratio; under the optimal weight ratio, the preset fitness function value of the fusion model meets a preset termination condition;
and carrying out weighted fusion on the trained first sub-model and the trained second sub-model based on the optimal weight ratio to obtain the weighted fusion model.
According to the bill auditing method provided by the invention, before the training data set is used for respectively training the first sub-model and the second sub-model, the bill auditing method further comprises the following steps:
acquiring original subject data corresponding to the target item;
screening the original subject data to obtain a first data set related to subject category characteristics;
obtaining each statistical analysis result of the original subject data, and constructing a second data set; the training data set is constructed based on the first data set and the second data set.
According to the bill auditing method provided by the invention, the screening of the original subject data is performed to obtain a first data set related to subject category characteristics, which comprises the following steps:
screening the original subject data to obtain a third data set related to the marketing type feature;
screening the original subject data to obtain a fourth data set related to the characteristics of the charging node;
and constructing and obtaining the first data set related to the subject category characteristics based on the third data set and the fourth data set.
According to the bill auditing method provided by the invention, the training data set is constructed based on the first data set and the second data set, and the bill auditing method comprises the following steps:
determining feature weights of the feature values in the first data set and the feature values in the second data set based on an entropy weight method;
and multiplying each characteristic value by the corresponding characteristic weight to obtain the training data set.
According to the bill auditing method provided by the invention, according to the difference between the subject actual data and the subject predicted data of the target item, an auditing result of the target item is obtained, and the bill auditing method comprises the following steps:
Calculating a confidence interval corresponding to the subject prediction data based on a preset confidence;
acquiring subject actual data corresponding to the target item;
if the subject actual data is located outside the confidence interval, judging that the auditing result of the target item is abnormal.
The invention also provides a bill auditing device, which comprises:
the data acquisition module is used for acquiring historical subject data corresponding to target items in the bill to be checked;
the bill prediction module is used for inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model;
the auditing result processing module is used for obtaining the auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item;
the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the bill auditing method according to any one of the above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a bill auditing method as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a bill auditing method as described in any one of the above.
According to the bill auditing method provided by the invention, historical subject data corresponding to target items in the bill to be audited is obtained; inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model; obtaining an auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item; the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data. According to the invention, accounting subjects of specific target items in the bill to be checked are predicted through the weighted fusion model, and bill checking is performed according to the difference between the prediction result and subject actual data, so that manual participation in checking is not needed, labor cost is reduced, the weighted fusion model is used for prediction, and accuracy of the checking result is improved, so that the existing bill checking efficiency is improved.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application environment of a bill auditing method provided by the present invention;
FIG. 2 is a flow chart of a bill auditing method according to the present invention;
FIG. 3 is a second flow chart of the bill auditing method according to the present invention;
FIG. 4 is a third flow chart of the bill auditing method according to the present invention;
FIG. 5 is a flow chart of a bill auditing method according to the present invention;
FIG. 6 is a flow chart of a bill auditing method according to the present invention;
FIG. 7 is a schematic diagram of a bill auditing apparatus according to the present invention;
fig. 8 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Specific embodiments of the present application are described below in conjunction with fig. 1-8.
The bill auditing method provided by the embodiment of the application can be applied to an application environment shown in figure 1. Wherein the terminal 101 communicates with the server 102 via a network. The data storage system may store data that the server 102 needs to process. The data storage system may be integrated on the server 102 or may be located on a cloud or other network server. The terminal 101 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 102 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a bill auditing method is provided, and the method is applied to the terminal 101 in fig. 1 for illustration, and includes the following steps:
step 201, acquiring historical subject data corresponding to a target item in a bill to be checked;
Specifically, the terminal 101 acquires, from the server 102, historical subject data corresponding to a target item in a bill to be audited. The bill to be checked refers to an accounting bill to be checked, which may include a plurality of items, for example, for an operator, may include income of a package a, income of a package B, etc., wherein the package a and the package B … … may respectively be target items. There may be multiple accounting subjects in each target item, for example, for package A, the current month's income, expense, etc. may be counted, and such income, expense, cost, etc. may be referred to as accounting subjects; because of complex business, a target item often includes multiple accounting subjects, for example, in a bill of a department of an operator, and a target item includes a monthly traffic charging subject and a monthly cost subject, each accounting subject may be uniquely identified by using a subject ID (Identity Document, identity), and historical subject data refers to bill data that has been actually generated in the past and corresponds to the target item, for example, historical subject data during the past 5 years and corresponds to a certain target item, including, but not limited to, a plurality of subject unit prices, a monthly subject total income, a daily income, and the like.
Step 202, inputting the historical subject data into a weighted fusion model to obtain subject prediction data of a target item output by the weighted fusion model;
the weighted fusion model is a combined model obtained by fusing a plurality of sub-models by using a model fusion method, and is used for improving the performance of the model, such as improving the prediction accuracy. The weighted fusion model refers to a model combination obtained by fusing two or more sub-models and training in advance, for example, a fusion model obtained by performing weighted fusion by using a Lasso (Least Absolute Shrinkage and Selection Operator, lasso algorithm) model and a LightGBM (Light Gradient Boosting Machine, light gradient lifting frame) model.
Specifically, the historical subject data is input into a pre-trained weighted fusion model for prediction, so that subject prediction data of a target item in a period to be predicted is obtained. The time period to be predicted can be one day or more days, and can be flexibly set according to actual needs.
And 203, obtaining an auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item.
The subject actual data of the target item refers to data actually generated by the target item in the period to be predicted, for example, the actual income of package A in 5 months is 5 ten thousand yuan.
Specifically, a statistical method may be used to determine the difference between the subject actual data and the subject predicted data, for example, a confidence interval corresponding to the subject predicted data is calculated based on a certain confidence, if the subject actual data of the target item is located outside the confidence interval, it is determined that the accuracy of the predicted value of the target item is not high, and the subject data is abnormal, and further audit processing, for example, further verification is performed manually.
The embodiment is that historical subject data corresponding to target items in bills to be checked are obtained; inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model; obtaining an auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item; the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data. According to the invention, accounting subjects of specific target items in the bill to be checked are predicted through the weighted fusion model, and bill checking is performed according to the difference between the prediction result and subject actual data, so that manual participation in checking is not needed, labor cost is reduced, the weighted fusion model is used for prediction, and accuracy of the checking result is improved, so that the existing bill checking efficiency is improved.
In one embodiment, as shown in fig. 3, fig. 3 shows a model training flowchart used in the present invention, and the pre-trained weighted fusion model is specifically obtained by adopting the following steps:
step 301, training the first sub-model and the second sub-model by using a training data set to obtain a trained first sub-model and a trained second sub-model; the training data set comprises subject history sample data of the target item and sample labels corresponding to each subject history sample data;
specifically, the terminal 101 acquires a training data set from a data storage system. The training data set is subject history sample data of the target item, the subject history sample data can be actual bill data of a plurality of months in the past, including subject ID, subject type, discount promotion data and the like, the data can be preprocessed, and the preprocessing process comprises the processes of data cleaning, feature primary screening, subject classification, feature engineering processing and the like. The training dataset also includes a sample tag corresponding to each historical sample data, for example, when year round data for a target item in 2012 is used as a model input, and it is desired to predict subject prediction data corresponding to the target item in 1 st 2013, subject actual data in 1 st 2013 may be used as the sample tag. As shown in FIG. 4, the training data set is used for training the first sub-model and the second sub-model independently to obtain a trained first sub-model and a trained second sub-model. For example, the first sub-model may be a Lasso regression model in linear regression, and the second sub-model may be a LightGBM regression model in nonlinear regression.
Step 302, constructing a fusion model according to a preset weight ratio of the trained first sub-model and the trained second sub-model;
model fusion is a method commonly used in machine learning, and by fusing a plurality of different sub-models, the performance of machine learning can be improved. The fusion model can be fused from different angles such as model results, models per se, sample sets and the like, and the invention fuses from the models per se, considers different capacities of different models, has different contributions to a final result and needs to represent the importance of the different models by weight.
Specifically, the method for constructing the fusion model by using weighted fusion requires a plurality of submodels to construct a model basic framework, wherein the model basic framework comprises a first submodel and a second submodel, for example, the first submodel can be a Lasso regression model in linear regression, the second submodel can be a LightGBM regression model in nonlinear regression, and the trained first submodel and the trained second submodel are fused by using a preset weight ratio to construct the fusion model.
Step 303, training the fusion model by using the training data set to adjust the weight ratio of the first sub-model and the second sub-model, so as to obtain an optimal weight ratio; under the optimal weight ratio, the preset fitness function value of the fusion model meets the preset termination condition;
Specifically, if the weight of the first sub-model is p, (1-p) of the weight of the second sub-model is set, the fusion model is trained by using the training data set, and the weights p and (1-p) are continuously adjusted in the training process until a preset fitness function meets a termination condition, such as maximum or minimum, and the weights p and (1-p) obtained at this time are called optimal weight ratio.
Optionally, the present invention may employ a genetic algorithm (Genetic Algorithm, abbreviated as GA) to determine the weights of two models, as shown in fig. 4, and fig. 4 illustrates a flowchart of determining the weights of the models by using the genetic algorithm, specifically including the following procedures:
1) Simultaneously constructing a Lasso regression model and a lightGBM regression model based on the training data set;
2) Setting a weight parameter range, iteration times of a genetic algorithm and the number of initial populations;
3) Generating k groups of initial value populations at random, predicting a training data set by taking each individual in the populations as the weight of a combined model, taking the root mean square error of a true value (namely a sample label) and subject prediction data as fitness, wherein the final fitness is shown in a formula 1:
where n is the total number of samples in the training dataset, e.g., n samples for the target term; p is the weight of the first sub-model, (1-p) is the weight of the second sub-model, Representing subject prediction data of a first sub-model for an ith sample during model training; />Representing subject prediction data for the ith sample during model training by the second sub-model; y is i Representative of subject actual data for the ith sample.
4) Selecting s excellent individuals from the initial population, intersecting and mutating the excellent individuals to generate new individuals, and circulating the process until a preset termination condition is met;
assuming that the first weight of the first sub-model is p, (1-p) of the second weight of the second sub-model, the fitness function (formula 1) is constructed as follows:
where n is the total number of samples in the training data, e.g., n samples for the objective, p is a first weight of a first sub-model, (1-p) is a second weight of a second sub-model,representing a predicted value of the first sub-model for the ith sample during model training; />Representing predicted values of the second sub-model for the ith sample during model training; y is i Representing the actual bill value of the ith sample.
And (3) stopping training by adjusting the size of p until the fitness function value fit meets a preset condition, such as reaching the minimum or being smaller than a preset value, and obtaining the optimal weight duty ratio of p and (1-p).
5) Finally, an optimal value is selected from the calendar population as a final result, namely the optimal weight ratio of the fusion model, and the GA_lasso_lightGBM model can be constructed based on the optimal weight ratio.
And step 304, carrying out weighted fusion on the trained first sub-model and the trained second sub-model based on the optimal weight ratio to obtain the weighted fusion model.
Specifically, based on the optimal weight ratio p and (1-p), the first sub-model and the second sub-model are subjected to weighted fusion to obtain a weighted fusion model GA_lasso_lightGBM model.
In the embodiment, the plurality of sub-models are fused through the weighted fusion method to obtain the weighted fusion model, so that preconditions are provided for intelligent auditing of the subject bill in the follow-up process.
In one embodiment, as shown in fig. 5, fig. 5 shows a schematic flow chart of preprocessing original subject data to obtain a training data set, including:
step 501, obtaining original subject data corresponding to a target item;
specifically, the terminal 102 acquires, from the data storage system, raw subject data, which is historical subject data indicating a target item, including actual bill data of the target item over the past several months, including subject ID, subject type, discount promotion data, and the like, for example.
Step 502, screening the original subject data to obtain a first data set related to subject category characteristics;
the subject category characteristics refer to a subject major category determined in advance and detail categories divided in the major category.
Specifically, feature screening is required before training, and feature vectors are constructed to remove data with little influence on the prediction result. Classifying the original subject data, firstly, determining a subject major class according to a business background, and then classifying subdivision classes in the major class; alternatively, the above raw data is classified according to the objective charging node characteristics, for example, a certain package charging mode of the operator includes five types: the first category is to deduct the cost of the package (subject) immediately after ordering; the second category is buckling number 5 per month; the third category is buckling number 1 per month; the fourth category is 25 # buckling per month; the fifth category is to be snapped every day. Further, the classification may be performed by combining the two classification manners, specifically, classification may be performed according to actual service characteristics to determine subject category characteristics, then the raw data are screened according to the subject category characteristics, and feature values of each sample are determined, so as to obtain a first data set, where the first data set includes feature values of each sample about the subject category characteristics.
Step 503, obtaining each statistical analysis result of the original subject data, and constructing a second data set.
Specifically, by analyzing the category characteristics of the subject in step 502, it can be found that there are two dimensions of characteristics in the data that have not been mined, one dimension is an autocorrelation dimension of the bill cost of the subject, and the other dimension is a time dimension characteristic. Therefore, the invention also carries out statistical analysis on the original subject data to obtain a plurality of statistical analysis results, thereby obtaining derivative feature sets of the two dimensions.
In detail:
1. the autocorrelation dimensions of historical subject data for a target item are illustrated as follows:
in order to further mine the features of the auto-correlation dimension of the historical subject data, considering that the data have month periodicity, firstly, carrying out statistical value derivation of the features at the same period of each month, and respectively deriving: 1) The current day of the month; 2) The average of the cost on the day of the last three months; 3) Cost standard deviation of the day of nearly three months; 4) The cost maximum for the day of the last three months; 5) Minimum cost for the day of the last three months; 6) Historical day cost average; 7) Historical current day cost standard deviation; 8) Historical day cost maximum; 9) Historical day cost minima.
Secondly, 10) the cumulative mean is derived in consideration of the influence of the overall trend of the data; 11 Cumulative standard deviation; 12 Cumulative maximum value).
The above-mentioned autocorrelation dimension features may also be calculated by other statistical methods, which are not limited in this regard by the present invention.
2. For the time dimension feature in the historical subject data corresponding to the target item, the following is illustrated:
through observation, the charging nodes of the target items of the part types have great influence on the fluctuation of the bill cost values, and based on the charging nodes, the target items with special charging nodes are subjected to characteristic derivation of the following category labels: 1) Whether it is a charging node; 2) Whether or not before the charging node; 3) Whether or not after the charging node.
And counting the autocorrelation characteristics and time characteristics among the data by the statistical method to obtain a statistical analysis result, and constructing a second data set. Each element in the second data set is a respective feature value in the derived feature.
Step 504, constructing the training data set based on the first data set and the second data set.
Specifically, the first data set and the second data set are combined together as a training data set.
Optionally, because the number of the original feature set and the derived feature set is large, in order to reflect the inherent difference between the features, different weights may be given to each feature, and each feature given with the weights may be combined to obtain the preprocessed training data.
In the above embodiment, the first data set related to the category characteristics of the subjects is obtained by adopting the method of feature screening, the derivative characteristic set (the second data set) is mined by adopting the method of statistical analysis, and the feature is derived from the autocorrelation dimension and the time dimension by analyzing the fluctuation characteristics of the bill cost of different types of subjects, so that the real factors influencing the bill cost are mined, and more effective training data is provided for model training.
In one embodiment, as shown in fig. 6, the step 502 includes:
step 601, screening the original subject data to obtain a third data set related to the marketing type feature;
specifically, in order to perform feature construction, the above raw data needs to be classified, and the present invention first determines marketing features that directly affect billing data: firstly, determining a subject major class according to the business characteristics corresponding to a target subject, and dividing the major class into subdivision categories; for example, taking a certain package service example of an operator, dividing the original subject data into four main types of features, wherein the first type is a subject attribute feature; the second category is subject ordering features; the third category is discounted promotional features; the fourth category is straight-drop promotional features; then, the detail categories are divided among the major categories, and the divided partial feature categories are shown in table 1 below:
TABLE 1 marketing-type characterization
And according to the marketing type, carrying out feature screening on the original subject data to obtain a third data set related to the marketing type feature, wherein the third data set comprises all feature values in the marketing type feature, so that data with small influence on a prediction result is removed.
Step 602, screening the original subject data to obtain a fourth data set related to the characteristics of the charging node;
specifically, based on the charging mode (i.e. charging node characteristic) of the existing target item, the target item can be divided into the following multiple types (for other items, the target item can also be divided into the following multiple types according to the charging node), for example, the charging of a certain package of the operator is carried out, the first type is to deduct the cost immediately after the subscription, the fluctuation condition of the bill cost of the type of the item has very high correlation with the newly added subscription quantity of the item, and the weight of the newly added subscription quantity can be increased during modeling; the second type is deduction number 5 of each month, the type of project bill cost presents that the cost before number 5 is 0, the current day bill cost value of number 5 is extremely high, the cost after number 5 is lower, and the bill cost fluctuation after number 5 is strongly related to the newly added order quantity of the project; the third category is buckling number 1 per month; the fourth category is 25 # buckling per month; the third class and the fourth class of project bill cost fluctuation conditions are basically similar to the second class, but the charging nodes are different, so that in the subsequent feature engineering, the charging node features can be derived for the three classes of data; the fifth category is daily deductions, where the billing rate for this type of target item fluctuates smoothly daily, but there is no single feature that is strongly correlated with it. And screening the original subject data based on the charging node characteristics to obtain a fourth data set, wherein the fourth data set comprises characteristic values about the charging node characteristics in the sample.
Through the analysis, the bill cost fluctuation characteristics of the target items of all types are different, and the influence factors are also different, if all the target items of all types are put together to construct a global model, the model effect is poor, so that the invention respectively constructs independent models according to the item types of the target items, namely, the independent training models of the target items of different types.
And step 603, constructing and obtaining the first data set related to the subject category characteristic based on the third data set and the fourth data set.
Specifically, the marketing-type feature and the charging node feature are combined, and the third data set and the fourth data set are combined to obtain the first data set as subject category features of the target item.
In the embodiment, subject category characteristics are obtained through characteristic screening, and data are paved for model training.
In one embodiment, the step 504 includes: determining feature weights of the feature values in the first data set and the feature values in the second data set based on an entropy weight method; and multiplying each characteristic value with the corresponding characteristic weight to obtain a training data set.
The entropy weight method is an objective assignment method. In a specific use process, the entropy weight method calculates the entropy weight of each feature by utilizing information entropy according to the variation degree of each feature, and then corrects the weight of each feature by the entropy weight, so that objective feature weights are obtained.
Generally, if the entropy weight of a certain feature is larger, the information amount provided is larger, and the effect of the information amount in comprehensive evaluation is larger; conversely, if the entropy weight of a feature is smaller, it means that the amount of information provided is smaller, and the effect on the overall evaluation is smaller.
Specifically, the entropy weighting method comprises the following steps:
1) Normalization processing is performed for each feature value in the subject category feature set (first data set) and the derivative feature set (second data set):
for convenience of description, the subject category feature set (first data set) and the derived feature set (second data set) are described aboveSet) collectively referred to as a feature set; assume that there are n rows of training samples in the feature set, m features, where x ij Representing the j-th column element of the i-th row (i.e., the j-th eigenvalue of the i-th sample).
If in practical application, the larger the characteristic value is, the better the characteristic value is, the normalization is carried out in the following mode:
wherein,,representing the feature value normalized for the j-th feature value of the i-th sample under the condition that the larger the feature value is, the better; x is x ij A j-th eigenvalue representing an i-th sample; min (x) j ) Representing the minimum sample value in the j-th feature; max (x) j ) Representing the maximum sample value in the j-th feature; j represents the feature number (j=1, 2, … m); i represents the number of samples (i=1, 2, … n)
If the feature value is smaller, modifying the normalization mode to be as follows:
wherein,,representing the feature value normalized for the j-th feature value of the i-th sample under the condition that the feature value is smaller and better; x is x ij A j-th eigenvalue representing an i-th sample; min (x) j ) Representing the minimum sample value in the j-th feature; max (x) j ) Representing the maximum sample value in the j-th feature;
2) Calculating the proportion P of the ith sample value under the jth characteristic ij
Where j represents the j-th feature (j=1, 2, … m), and m represents the total number of features.
3) Calculating entropy value E of jth feature j
Where i represents the i-th sample value (i=1, 2, … n), and n represents the total number of samples.
4) Generating final entropy weights omega for features j I.e. feature weights of the respective features:
finally, the characteristic weight omega is given to each characteristic in the characteristic set j And multiplying each characteristic value by the corresponding characteristic weight to obtain a numerical value set as a training data set.
According to the embodiment, the characteristic weight is given to each characteristic through the entropy weight method, so that the influence of the characteristic with larger influence on the prediction result can be highlighted, the influence of the characteristic with small influence on the prediction result is weakened, the inherent difference among the characteristics is highlighted, and the prediction accuracy is further improved.
In one embodiment, the step 203 includes: calculating a confidence interval corresponding to the subject prediction data based on the preset confidence; acquiring subject actual data corresponding to a target item; if the subject actual data is located outside the confidence interval, judging that the auditing result of the target item is abnormal.
Specifically, a confidence interval corresponding to the subject prediction data is calculated under a certain confidence requirement, when the subject actual data is located outside the confidence interval, the auditing result of the target item is judged to be abnormal, and bill early warning information about the target item can be sent to the service processing end so as to be processed in the next step or manually.
According to the embodiment, whether the target item in the bill meets the requirement is automatically judged by comparing the difference between the subject prediction data and the subject actual data, manual participation in correction is not needed, labor cost is reduced, efficient updating iteration can be carried out by the model based on new data, and bill auditing efficiency is further improved.
The bill auditing device provided by the present invention is described below, and the bill auditing device 700 described below and the bill auditing method described above can be referred to correspondingly.
In one embodiment, as shown in fig. 7, there is provided a bill auditing apparatus 700, comprising: a data acquisition module 701, a bill prediction module 702 and an audit result processing module 703, wherein:
The data acquisition module 701 is configured to acquire historical subject data corresponding to a target item in a bill to be audited;
the bill prediction module 702 is configured to input the historical subject data into a weighted fusion model, so as to obtain subject prediction data of the target item output by the weighted fusion model;
an audit result processing module 703, configured to obtain an audit result of the target item according to a difference between the subject actual data and the subject predicted data of the target item;
the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data.
In one embodiment, the bill auditing device 700 further includes a model training unit, configured to train the first sub-model and the second sub-model by using the training data set, so as to obtain a trained first sub-model and a trained second sub-model; wherein the training data set comprises subject history sample data of the target item and the sample tags corresponding to each of the subject history sample data; constructing a fusion model according to the first trained sub-model and the second trained sub-model by the preset weight ratio; training the fusion model by utilizing the training data set to adjust the weight ratio of the first sub-model and the second sub-model so as to obtain an optimal weight ratio; under the optimal weight ratio, the preset fitness function value of the fusion model meets a preset termination condition; and carrying out weighted fusion on the trained first sub-model and the trained second sub-model based on the optimal weight ratio to obtain the weighted fusion model.
In one embodiment, the model training unit is further configured to: acquiring original subject data corresponding to the target item; screening the original subject data to obtain a first data set related to subject category characteristics; obtaining each statistical analysis result of the original subject data, and constructing a second data set; the training data set is constructed based on the first data set and the second data set.
In one embodiment, the model training unit is further configured to: screening the original subject data to obtain a third data set related to the marketing type feature; screening the original subject data to obtain a fourth data set related to the characteristics of the charging node; and constructing and obtaining the first data set related to the subject category characteristics based on the third data set and the fourth data set.
In one embodiment, the model training unit is further configured to: determining feature weights of the feature values in the first data set and the feature values in the second data set based on an entropy weight method; and multiplying each characteristic value by the corresponding characteristic weight to obtain the training data set.
In one embodiment, the audit result processing module 703 is further configured to: calculating a confidence interval corresponding to the subject prediction data based on a preset confidence; acquiring subject actual data corresponding to the target item; if the subject actual data is located outside the confidence interval, judging that the auditing result of the target item is abnormal.
The various modules in the bill auditing device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
Fig. 8 illustrates a physical structure diagram of an electronic device, as shown in fig. 8, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 810 may invoke logic instructions in memory 830 to perform a bill auditing method that includes: acquiring historical bill data corresponding to a target subject; inputting the historical bill data into a pre-trained weighted fusion model for prediction to obtain a bill prediction value of the target subject; and obtaining an audit result of the target objective according to the difference between the actual bill of the target objective and the bill forecast value.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing a bill auditing method provided by the methods described above, the method comprising: acquiring historical subject data corresponding to target items in a bill to be checked; inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model; obtaining an auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item; the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a bill auditing method provided by the above methods, the method comprising: acquiring historical subject data corresponding to target items in a bill to be checked; inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model; obtaining an auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item; the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for auditing bills, comprising:
acquiring historical subject data corresponding to target items in a bill to be checked;
inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model;
obtaining an auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item;
the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data.
2. The bill auditing method according to claim 1, wherein the weighted fusion model is derived by:
respectively training the first sub-model and the second sub-model by using a training data set to obtain a trained first sub-model and a trained second sub-model; wherein the training data set comprises subject history sample data of the target item and the sample tags corresponding to each of the subject history sample data;
constructing a fusion model according to the first trained sub-model and the second trained sub-model by the preset weight ratio;
Training the fusion model by utilizing the training data set to adjust the weight ratio of the first sub-model and the second sub-model so as to obtain an optimal weight ratio; under the optimal weight ratio, the preset fitness function value of the fusion model meets a preset termination condition;
and carrying out weighted fusion on the trained first sub-model and the trained second sub-model based on the optimal weight ratio to obtain the weighted fusion model.
3. The bill auditing method according to claim 2, further comprising, prior to training the first sub-model and the second sub-model with the training dataset, respectively:
acquiring original subject data corresponding to the target item;
screening the original subject data to obtain a first data set related to subject category characteristics;
obtaining each statistical analysis result of the original subject data, and constructing a second data set;
the training data set is constructed based on the first data set and the second data set.
4. A bill auditing method according to claim 3, wherein said filtering the raw subject data to obtain a first data set relating to subject category characteristics comprises:
Screening the original subject data to obtain a third data set related to the marketing type feature;
screening the original subject data to obtain a fourth data set related to the characteristics of the charging node;
and constructing and obtaining the first data set related to the subject category characteristics based on the third data set and the fourth data set.
5. A bill auditing method according to claim 3, in which the constructing the training dataset based on the first dataset and the second dataset comprises:
determining feature weights of the feature values in the first data set and the feature values in the second data set based on an entropy weight method;
and multiplying each characteristic value by the corresponding characteristic weight to obtain the training data set.
6. A bill auditing method according to any of claims 1 to 5, in which the obtaining the auditing result for the target item based on the difference between the subject actual data and the subject predicted data for the target item includes:
calculating a confidence interval corresponding to the subject prediction data based on a preset confidence;
Acquiring subject actual data corresponding to the target item;
if the subject actual data is located outside the confidence interval, judging that the auditing result of the target item is abnormal.
7. A bill auditing device, the device comprising:
the data acquisition module is used for acquiring historical subject data corresponding to target items in the bill to be checked;
the bill prediction module is used for inputting the historical subject data into a weighted fusion model to obtain subject prediction data of the target item output by the weighted fusion model;
the auditing result processing module is used for obtaining the auditing result of the target item according to the difference between the subject actual data and the subject predicted data of the target item;
the weighted fusion model is trained based on subject history sample data of the target item and sample tags corresponding to each subject history sample data.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the bill auditing method of any of claims 1 to 6 when the program is executed by the processor.
9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements a bill auditing method according to any of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements a bill auditing method according to any one of claims 1 to 6.
CN202211679469.3A 2022-12-26 2022-12-26 Bill auditing method and device Pending CN116912016A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211679469.3A CN116912016A (en) 2022-12-26 2022-12-26 Bill auditing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211679469.3A CN116912016A (en) 2022-12-26 2022-12-26 Bill auditing method and device

Publications (1)

Publication Number Publication Date
CN116912016A true CN116912016A (en) 2023-10-20

Family

ID=88349877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211679469.3A Pending CN116912016A (en) 2022-12-26 2022-12-26 Bill auditing method and device

Country Status (1)

Country Link
CN (1) CN116912016A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217711A (en) * 2023-10-23 2023-12-12 广东电网有限责任公司 Automatic auditing method and system for communication fee receipt

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217711A (en) * 2023-10-23 2023-12-12 广东电网有限责任公司 Automatic auditing method and system for communication fee receipt

Similar Documents

Publication Publication Date Title
CN108564286B (en) Artificial intelligent financial wind-control credit assessment method and system based on big data credit investigation
CN108846520B (en) Loan overdue prediction method, loan overdue prediction device and computer-readable storage medium
CN108364106A (en) A kind of expense report Risk Forecast Method, device, terminal device and storage medium
CN110852881B (en) Risk account identification method and device, electronic equipment and medium
CN112633962B (en) Service recommendation method and device, computer equipment and storage medium
CN107730286A (en) A kind of target customer's screening technique and device
CN110415036B (en) User grade determining method, device, computer equipment and storage medium
CN111797320A (en) Data processing method, device, equipment and storage medium
CN111695084A (en) Model generation method, credit score generation method, device, equipment and storage medium
CN116912016A (en) Bill auditing method and device
CN116915710A (en) Traffic early warning method, device, equipment and readable storage medium
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN114004691A (en) Line scoring method, device, equipment and storage medium based on fusion algorithm
CN117333290B (en) Integrated multi-scale wind control model construction method
CN115099934A (en) High-latency customer identification method, electronic equipment and storage medium
CN110110885A (en) Information forecasting method, device, computer equipment and storage medium
CN113177733B (en) Middle and small micro enterprise data modeling method and system based on convolutional neural network
CN115293867A (en) Financial reimbursement user portrait optimization method, device, equipment and storage medium
CN114757397A (en) Bad material prediction method, bad material prediction device and electronic equipment
CN114493686A (en) Operation content generation and pushing method and device
CN117217711A (en) Automatic auditing method and system for communication fee receipt
CN113822464A (en) User information processing method and device, electronic equipment and storage medium
CN110852854B (en) Method for generating quantitative gain model and method for evaluating risk control strategy
CN114548620A (en) Logistics punctual insurance service recommendation method and device, computer equipment and storage medium
CN112862602B (en) User request determining method, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination