CN113342855A

CN113342855A - Data matching method and device based on big data

Info

Publication number: CN113342855A
Application number: CN202110703046.XA
Authority: CN
Inventors: 周晔; 穆海洁; 肖竣唏
Original assignee: China Pnr Co ltd
Current assignee: China Pnr Co ltd
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-09-03
Anticipated expiration: 2041-06-24
Also published as: CN113342855B

Abstract

The invention relates to the technical field of computers, in particular to a data matching method and device based on big data. The method comprises the following steps: s1, data source configuration, namely configuring information of a data source to be matched; s2, data rule analysis, setting initial rule analysis configuration parameters, automatically acquiring data to be matched, performing rule analysis calculation, and generating multiple matching rules and matching degree results thereof; s3, data rule configuration, namely selecting the generated matching rule as a data matching rule or a custom configuration data matching rule; s4, data matching processing, namely, sequentially matching the data to be matched according to the selected data matching rules until the data matching is successful or all the rules are traversed; and S5 a matching result processing step, wherein the matching result is processed. The invention can analyze the data matching rule on the basis of zero knowledge of specific data service, and can quickly perform matching comparison treatment on the data by using the configuration tree.

Description

Data matching method and device based on big data

Technical Field

The invention relates to the technical field of computers, in particular to a data matching method and device based on big data.

Background

In the third-party payment field, the most common data matching and comparing scene is the reconciliation service, and in each financial payment system, a large number of reconciliation scenes in the system, between the systems and the external system necessarily exist.

With the popularization of mobile phone payment, the code scanning transaction amount is improved in multiples, for account checking business, the data comparison work of large data amount simply depends on the traditional database correlation query, the expansibility requirement of the data cannot be met, and the performance of the database is also high.

At present, the most popular is to perform real-time streaming calculation on data, firstly, data collection is performed, data are sent to a message middleware through collecting database log files or an ETL (data warehouse technology) tool, the most popular data real-time processing framework Apache Flink at present is applied, and the data are subjected to real-time matching comparison work in a streaming mode through configuring a calculation rule, and output and subsequent treatment are performed.

Apache Flink is an open source stream processing framework developed by the Apache software foundation. Flink executes arbitrary stream data programs in a data parallel and pipelined manner, and Flink's pipelined runtime system can execute batch and stream processing programs.

The technical scheme of data matching comparison commonly used at present has the following problems:

in the data reconciliation process, due to various and complex service scenes, different data comparison requirements exist at any time, the number of associated systems is very large, the matching rules of the data association relationship among the systems are different, and a large amount of time is required for manual investigation and confirmation of the data;

the matching calculation of a real-time computing platform has window limitation, and the complex rule matching calculation cannot realize complete data matching in hundreds;

the online updating of each system function is frequent, so that the data matching rule changes frequently;

in a daily abnormal scene, data needs to be subjected to rapid abnormal confirmation processing, and the conventional account checking system is insufficient in flexibility.

Therefore, a data matching method and device based on big data are needed to meet the requirement of comparing and matching different data among multiple association systems in different service scenarios.

Disclosure of Invention

The invention aims to provide a data matching method and device based on big data, and solves the problems that different data among multiple correlation systems in different service scenes are difficult to compare and match and the flexibility is insufficient in the prior art.

In order to achieve the above object, the present invention provides a data matching method based on big data, comprising the following steps:

s1, data source configuration, namely configuring information of a data source to be matched;

s2, data rule analysis, setting initial rule analysis configuration parameters, automatically acquiring data to be matched, performing rule analysis calculation, and generating multiple matching rules and matching degree results thereof;

s3 data rule configuration step, selecting the matching rule generated in step S2 as a data matching rule or a custom configuration data matching rule;

s4, data matching processing, namely, sequentially matching the data to be matched according to the selected data matching rules until the data matching is successful or all the rules are traversed;

and S5 a matching result processing step, wherein the matching result is processed.

In an embodiment, the S1 data source configuring step further includes:

and receiving the original data to be matched or the primarily matched data sent by the message middleware as the data to be matched.

In one embodiment, the rule analyzes configuration parameters, further comprising:

transaction reference source, set, item set, support, minimum support, frequent item set, confidence, minimum confidence, filter item set, and rule confidence condition.

In an embodiment, the step S2 of automatically acquiring data to be matched to perform rule analysis and calculation, and generating a plurality of matching rules and matching degree results thereof, further includes:

s21, dividing the acquired data set to be matched into item set units according to the fields;

s22, analyzing configuration parameters according to rules, and screening to obtain a frequent item set;

s23, calculating the weight of the frequent item set to the transaction support degree according to the data dispersion of the item set value;

s24, according to the weight of each frequent item set, recurrently traversing the frequent item set data in sequence to analyze the matching rules;

s25 outputs the matching rule and the matching degree result.

In an embodiment, the step S24, further includes:

s241, sequentially ordering the frequent item sets of the data sets to be matched from high to low according to the weight of each frequent item set;

s242, traversing a frequent item set of a set to be matched by taking the data set of the transaction reference source as a matching source;

s243, performing recursive matching on the data character strings according to the KMP algorithm until the matching is successful, and obtaining a corresponding matching rule;

s244, according to the matching rule, traversing the data to be matched, calculating the rule confidence coefficient and the matching success degree, and filtering the rule with the rule confidence coefficient lower than the minimum confidence coefficient.

In an embodiment, the step S244 further includes:

calculating the confidence of the current rule;

if the confidence coefficient of the rule is 100%, saving the corresponding rule and the confidence coefficient, and exiting all traversals;

if the confidence coefficient is more than 100% and not less than the minimum confidence coefficient, the current rule and the confidence coefficient are saved, the data which are successfully matched and meet the rule confidence condition are eliminated, the step S242 is returned to continue traversing, and other matching rules are searched;

if the confidence coefficient is less than the minimum confidence coefficient, calculating the matching success degree;

if the matching success degree is less than the minimum confidence degree, the current rule is considered as an invalid rule, the step S242 is returned to continue traversing, and other matching rules are searched;

and if the matching success degree is more than or equal to the minimum confidence degree, re-extracting data under the condition of the current rule to exclude the traversed item set, and starting traversal again until all the traversals are completed.

In an embodiment, the step S4, further includes:

s41 reading the latest rule information;

s42 reading message data matching source information;

s43 searching the relevant matching rule of the data source information and generating a configuration tree;

s44, traversing each matching rule in the configuration tree, and performing data matching until matching is successful or traversing is completed.

In one embodiment, the rule information further includes:

the rule applies to the data source; the data applicable condition in the data source according with the rule; matching a target source; matching rules; whether the matching is finished is marked; matching result configuration is carried out after matching is successful; and configuring matching results of matching failure.

In order to achieve the above object, the present invention provides a data matching apparatus based on big data, comprising:

a memory for storing instructions executable by the processor;

a processor for executing the instructions to implement the method of any one of the above.

In order to achieve the above object, the present invention provides a data matching system based on big data, comprising:

the message middleware receives original data to be matched or primarily matched data as data to be matched and sends the data to the processor;

a memory for storing instructions executable by the processor;

According to the data matching method, device and system based on big data, provided by the invention, the data matching rules can be automatically analyzed on the basis of zero knowledge of specific data services, the matching rules can be configured through flexible additional modification, and the data can be rapidly matched, compared and disposed through the use of the configuration tree.

Drawings

The above and other features, properties and advantages of the present invention will become more apparent from the following description of the embodiments with reference to the accompanying drawings in which like reference numerals denote like features throughout the several views, wherein:

FIG. 1 discloses a flow chart of a big data based data matching method according to an embodiment of the invention;

FIG. 2 discloses a flowchart of a method of data rule analysis steps according to an embodiment of the invention;

FIG. 3 discloses a flowchart of a method for frequent item set matching rule analysis according to an embodiment of the invention;

FIG. 4 discloses a method flowchart of the data matching processing steps according to an embodiment of the invention;

FIG. 5 discloses a schematic diagram of a rule tree of a voyage accounting system according to an embodiment of the invention;

FIG. 6 discloses a schematic diagram of a big data based data matching apparatus according to an embodiment of the present invention;

FIG. 7 discloses a schematic diagram of a big data based data matching system according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The big data based data matching method, device and system provided by the invention can automatically analyze and extract the data matching rules, quickly compare and match the data through additional configuration, can be applied to data comparison among related data, and particularly relates to analysis and automatic matching of the data matching rules of big data.

The invention is mainly applied to the checking of data states among various associated systems in different business scenes in a company and the checking of data of external bank system information, ensures that the business process state is normally carried out, and can discover and inform in time when abnormality occurs.

Fig. 1 discloses a flow chart of a data matching method based on big data according to an embodiment of the present invention, and as shown in fig. 1, the data matching method based on big data according to the present invention includes the following steps:

Each step of the present invention is described in detail below.

The S1 data source configuring step further includes:

In case of abnormal conditions, only the data needs to be pushed to the message middleware again, and any data needing to be matched can be processed repeatedly at any time;

for adding a new data rule, only corresponding configuration content needs to be added, the application does not need to be released and on-line, the restart is not needed, the rule takes effect in time, and a matching result is obtained quickly.

Fig. 2 discloses a flowchart of a method of the data rule analyzing step according to an embodiment of the present invention, in which, in the data rule analyzing step of S2 shown in fig. 2, the sample data object is automatically obtained through the rule analysis configuration parameters, and the data is analyzed and calculated to generate a plurality of matching rules and matching degree results thereof.

Setting initial rule analysis configuration parameters, wherein the rule analysis configuration parameters comprise: transaction reference source, set, item set, support, minimum support, frequent item set, confidence, minimum confidence, filter item set, and rule confidence condition.

The transaction reference source refers to a reference data source of a single record embodying a single transaction.

The set is a data set to be matched and is divided into two data sets for data matching.

The item set refers to a value set of each field in a data structure of a data set to be matched.

The support degree is the embodiment of the association degree of the item set to the transaction.

For example, each time a transaction occurs, the item set has a corresponding value to generate the reflection, and the item set has 100% support for the transaction.

The minimum support degree is the lowest value of the support degree of the association of the item set and the affairs, and is defined according to the service scene, and the range is 0-100%.

The frequent item set refers to an item set with the support degree of the transaction greater than the minimum support degree.

The confidence level refers to the applicability of a certain data rule in all transactions.

For example, if the data rule "set a. item set a1 ═ set B. item set B3" is met in 60 transactions, say 100 transactions, then the confidence level of the data rule is 60%.

The minimum confidence level refers to the lowest value of whether a certain data rule can be applied.

The filtering item set is configured according to business customization, and is a set of meaningful items without business association, such as self-added id, remark, description and the like.

The rule confidence condition is a condition for confirming that a data rule matching result is correct, and can be customized according to a service.

For example, in the set a and the set B, if the transactions are the same, the condition that the sum of the money amounts is equal to each other must be satisfied as a rule confidence condition.

In step S2, the method automatically obtains data to be matched to perform rule analysis and calculation, and generates a plurality of matching rules and matching degree results thereof, further including the following steps:

s21, the acquired data set to be matched is divided into item set units according to the fields.

And acquiring data to be matched according to the data source to be matched, and synchronously putting the data of the two parties to be matched into a temporary table of a local database.

And taking out the data from the temporary table of the local database, putting the data into a memory, and sequentially generating item sets List.

S22, analyzing the configuration parameters according to the rule, and screening to obtain a frequent item set.

And (4) the rule analysis configuration parameter is the minimum support degree, the support degree of the item set to the transaction is calculated, and the frequent item set is obtained by screening according to the initially set minimum support degree.

Furthermore, initially set filter item sets and item sets in the rule confidence conditions are excluded, and then the support degree of each item set to the transaction is calculated, so that the speed and the efficiency of data rule analysis are improved.

The support degree of each item set to the transaction corresponds to the following formula:

support is count (non-null value) transaction set ratio/transaction count.

The transaction refers to the number of data in the source data, namely the transaction number.

S23, calculating the weight of the frequent item set to the transaction support degree according to the data dispersion of the item set value.

In the frequent item sets screened out by processing, the weight of each frequent item set to the transaction support degree is calculated according to the dispersion of the weight values (value values) of the item set values, and the higher the dispersion is, the higher the weight value is.

The formula corresponding to the frequent item set weight is as follows:

the dispersion (weight) is count (value after item set deduplication)/transaction number.

S24, according to the weight of each frequent item set, the frequent item set data is recurrently traversed in sequence to carry out matching rule analysis.

The weighting value of each frequent item set is in the range of 0-1 item set (an item set larger than 1 indicates that the item set is irrelevant to the transaction).

Fig. 3 discloses a flowchart of a method for analyzing frequent item set matching rules according to an embodiment of the present invention, and the step S24 shown in fig. 3 further includes:

s241, sequentially ordering the frequent item sets of the data sets to be matched from high to low according to the weight of each frequent item set.

And sequencing the frequent item sets of the data sets to be matched from high to low in sequence according to the weight of each frequent item set to generate a List of the item sets to be matched.

S242, taking the data set of the transaction reference source as a matching source, and traversing the frequent item set of the set to be matched.

And taking the data set of the transaction reference source as a matching source, and sequentially traversing the frequent item set data of the data set to be matched from the frequent item set with the highest weight.

S243 recursively matches the data strings according to the KMP algorithm.

And performing data matching according to the KMP algorithm until the matching is successful, and obtaining a corresponding matching rule.

The KMP algorithm is that a long string is taken as a mother string in two character strings, a relatively short string is taken as a substring (if the lengths are consistent, an item set in a default transaction reference source is taken as a substring), the position of the mother string completely matched with the substring in the mother string is obtained, and the matched mother string cannot be returned to-1.

For example, the parent string is: ABCDABECDE;

if the substring is: DABE, then the position of the returned substring in the mother string is: 4;

available rules: substring (mother string, 4, substring length);

if the substring is: BAEF, then return: -1, stating that the pair of strings cannot match;

in an example: frt _ txn _ id1 is trace _ id.

Traversing the matching source item set data in the rule, inquiring the data to be matched according to the obtained rule as an inquiry condition to confirm whether the data to be matched can be matched, and if the matching is successful, accumulating the number of successfully matched data;

and then judging the successfully matched data, judging whether the successfully matched data meets the rule confidence condition, and if so, accumulating the number of the rule confidence pieces.

After traversing, calculating the confidence coefficient of the rule, wherein the corresponding formula is as follows:

the confidence is the number of regular confidences/number of transactions.

If the confidence coefficient is 100%, the current rule and the confidence coefficient are stored, and all traversals are quitted;

if the confidence coefficient is more than 100% and not less than the minimum confidence coefficient, the current rule and the confidence coefficient are saved, the data which are successfully matched and meet the rule confidence condition are eliminated, the item set traversal in the step S242 is returned, and whether other rules exist is continuously searched.

If the confidence coefficient is less than the minimum confidence coefficient, calculating the matching success degree, wherein the corresponding formula is as follows:

matching success degree is the number of matching success pieces/number of transactions.

In an example: the confidence coefficient is 60 percent,

storing the current rule and the corresponding confidence coefficient in the memory: frt _ txn _ id1 is trace _ id, 60%.

And if the matching success degree is less than the minimum confidence degree, the current rule is considered as an invalid rule, the item set traversal in the step S242 is returned, other matching rules are continuously searched, and the pseudo rule of successful data matching caused by data coincidence is reduced.

And if the matching success rate > is the minimum confidence, the limited range of the current rule is considered to be insufficient, other rules exist, data analysis in the rule needs to be carried out, the current rule is taken as a condition to extract data again to exclude the traversed item set, and traversal is started again until all traversals are completed.

For example, some rule existence item set a1, item set B1, AND item set A3, item set B3 combine multiple rule conditions to match, AND since a single rule can successfully match to a larger range of data, the rule confidence condition cannot be satisfied, AND the data matching range needs to be further narrowed.

At this time, the traversal of step S2242 is performed again on the successfully matched data, the traversal source to be matched is the successfully matched source to be matched corresponding to the successfully matched source to be matched, the traversed item set is filtered, and the rule analysis in the step S243 is performed again until a new successfully matched matching rule is found, so as to form recursive computation.

And traversing the rules of other item sets again according to the matching result of the existing rules, wherein if the traversal is completed and no new rule is found, the existing rule is an invalid rule.

S25 outputs the matching rule and the matching degree result.

The matching degree result comprises confidence degree and/or matching success degree;

in this embodiment, the matching rule and the corresponding confidence that are finally saved are output.

The following describes the data rule analysis step of S2, taking the matching rule of the accounting data of the accounting system and the POS transaction system data in the company as an example.

Setting initial rule analysis configuration parameters.

And (3) a data source to be matched:

data 1: (accounting data);

data type: driving a database: mysql junction information: user/password

url:[email protected]；

Sampling range: acct _ date is 20210501(200 pieces of data);

data 2: (POS service data);

data type: driving a database: oracle connection information: user/password

url：[email protected]；

Sampling range: trans _ date is 20210501(100 pieces of data);

transaction reference source: data 2;

minimum support: 50 percent;

minimum confidence: 30 percent;

a set of filter items: set A { create _ time, last _ update _ time }, set B { add _ data };

and (3) rule confidence condition: sum (trans _ amt).

And S21, acquiring the data to be matched according to the data source to be matched, and sequentially generating item set Lists according to the field splitting.

The set of accounting system data items is shown in table 1 below;

the set of POS transaction data items is shown in table 2 below.

TABLE 1

Column name	Whether or not it is empty	Description of the invention
			acct_seq_id	N	Account serial number (for front end)
acct_date	N	Date of accounting
			log_id	N	Running water number ID
acct_no	N	Customer numberSub account number
			trans_type	N	Type of transaction
dc_flag	N	Lending mark
			trans_amt	N	Amount of transaction
acct_bal	N	After the fact balance
			cust_id	N	Customer number
cust_id2	Y	Second customer number
			product_id	Y	Product code
dept_id	Y	Code of division of business department
			trans_name	Y	Name of transaction
trans_obj	Y	Transaction object
			mer_order_id	Y	Merchant order number
pnr_id	Y	PNR number
			gate_id	Y	Gateway number
frt_txn_id1	Y	Front desk transaction number 1
			frt_txn_id2	Y	Front desk transaction number 2
frt_txn_id3	Y	Front desk transaction number 3
			frt_req_date	N	Front end request date
frt_req_id	N	Front end pleaseFlow number
			sys_id	N	System code
frt_req_dtl_id	N	Front-end request detail flow number
			trace_id	Y	Global serial number
rsvl	Y	Reserved field 1
			create_time	N	Creation time
last_update_time	N	Last update time

TABLE 2

Column name	Whether or not it is empty	Description of the invention
			TRANS_DATE	N	Date when the order was generated: sysDate
SYS_TIME	N	Transaction time hhmmss
			PROD_ID	N	Product number
CASH_ORD_ID	Y	Order number of cash register
			OUT_ORD_ID	Y	External order number
MEMBER_ID	N	Service system client number
			MER_OPER_ID	Y	Business system operator
FEE_AMT	Y	Commission fee
			FEE_FORMULA	Y	Rate formula
PA_CUST_ID	N	PA customer number
			PA_ACCT_ID	N	PA Account number
PNR_DEV_ID	Y	Logic number for POSP to uniquely identify pos machine
			POS_MER_ID	Y	Commercial tenant number opened by remittance deposit clearing bank
POS_MER_NAME	Y	Special business name
			TERM_ORD_ID	N	Order number generated by terminal machine
TERM_BATCH_ID	Y	Terminal batch number
			ORD_ID	N	Order number
TRANS_TYPE	Y	Type of transaction
			ORD_AMT	Y	Amount of order
BANK_CODE	Y	Bank return code, available for empty
			BANK_MESSAGE	Y	Bank returns information, available
PA_TRANS_ID	Y	PA serial number
			TRANS_ID	Y	Order number generated by PA scanning code cashier
PAY_CARD_ID	Y	Payment card number
			TRANS_STAT	N	Status of transaction
CORRECT_STAT	N	Towards the right state
			ACCT_STAT	N	Accounting state
RESP_CD	Y	P0sP answer code
			BANK_RESP_CD	Y	BANK response code
REF_NUM	Y	Voucher number
			POSP_SEQ_ID	Y	Unionpay inquiry serial number
CARD_BANK_ID	Y	Unionpay standard card issuing bank number
			GATE_ID	Y	Gateway
TRACE_ID	Y	Global serial number
			MESSAGE	Y	Information
UPD_DATATIME	Y	Updating timestamps

S22, calculating the support degree of the item set to the affairs, and screening to obtain a frequent item set according to the minimum support degree set initially.

If the minimum support degree set initially is 50%, the item set with the support degree greater than or equal to 50% is a frequent item set according to the minimum support degree of 50%.

The transaction reference source is POS service data, the number of the data is 100, namely the number of times of occurrence of the transaction in the embodiment is 100, and the transaction set proportion of the POS service data is 1: 1; the number of the accounting entries is 200, and the transaction set proportion of the accounting data is 1: 2.

The set of frequent items after the screening of the accounting system data set is shown in table 3 below;

the frequent item set after screening the POS transaction data set is shown in table 4 below.

TABLE 3

Column name	Whether or not it is empty	Description of the invention
			acct_seq_id	N	Account serial number (for front end)
acct_date	N	Date of accounting
			log_id	N	Running water number ID
acct_no	N	Customer number sub-account number
			trans_type	N	Type of transaction
dc_flag	N	Lending mark
			acct_bal	N	After the fact balance
cust_id	N	Customer number
			product_id	Y	Product code
frt_txn_id1	Y	Front desk transaction number 1
			frt_req_date	N	Front end request date
frt_req_id	N	Front-end request serial number
			sys_id	N	System code
frt_req_dtl_id	N	Front-end request detail flow number
			trace_id	Y	Global serial number

TABLE 4

Column name	Whether or not it is empty	Description of the invention
			TRANS_DATE	N	Date when the order was generated: sysDate
SYS_TIME	N	Transaction time hhmmss
			PROD_ID	N	Product number
CASH_ORD_ID	Y	Order number of cash register
			MEMBER_ID	N	Service system client number
FEE_AMT	Y	Commission fee
			FEE_FORRMULA	Y	Rate formula
PA_CUST_ID	N	PA customer number
			PA_ACCT_ID	N	PA Account number
PNR_DEV_ID	Y	Logic number for POSP to uniquely identify pos machine
			POS_MER_ID	Y	Remittance deposit clearing bankThe provided merchant number
POS_MER_NAME	Y	Special business name
			TERM_ORD_ID	N	Order number generated by terminal machine
TERM_BATCH_ID	Y	Terminal batch number
			ORD_ID	N	Order number
TRANS_TYPE	Y	Type of transaction
			PA_TRANS_ID	Y	PA serial number
PAY_CARD_ID	Y	Payment card number
			TRANS_STAT	N	Status of transaction
CORRECT_STAT	N	Towards the right state
			ACCT_STAT	N	Accounting state
RESP_CD	Y	POSP answer code
			BANK_RESP_CD	Y	BANK response code
REF_NUM	Y	Voucher number
			POSP_SEQ_ID	Y	Unionpay inquiry serial number
CARD_BANK_ID	Y	Unionpay standard card issuing bank number
			GATE_ID	Y	Gateway
TRACE_ID	Y	Global serial number

And finishing the steps S23-S24, and performing matching rule analysis by sequentially and recursively traversing the frequent item set data.

The accounting data weights are shown in table 5 below;

the POS transaction data weights are shown in table 6 below.

TABLE 5

Column name	Description of the invention
		acct_seq_id	Account serial number (for front end)
log_id	Running water number ID
		frt_txn_id1	Front desk transaction number 1
frt_rec_id	Front-end request serial number
		trace_id	Global serial number
trans_amt	Amount of transaction
		acct_bal	After the fact balance
acct_no	Customer number sub-account number
		cust_id	Customer number
trams_type	Type of transaction
		frt_req_dtl_id	Front-end request detail flow number
product_id	Product code
		sys_id	System code
dc_flag	Lending mark
		acct_date	Date of accounting
frt_req_date	Front end request date

TABLE 6

Column name	Description of the invention
		CASH_ORD_ID	Order number of cash register
ORD_ID	Order number
		PA_TRANS_ID	PA serial number
TRACE_ID	Global serial number
		REF_NUM	Number of the title
POSP_SEQ_ID	Unionpay inquiry serial number
		SYS_TIME	Transaction time hhmmss
TERM_ORD_ID	Order number generated by terminal machine
		PAY_CARD_ID	Payment card number
PA_CUST_ID	PA customer number
		PA_ACCT_ID	PA Account number
PNR_DEV_ID	Logic number for POSP to uniquely identify pos machine
		MEMBER_ID	Service system client number
POS_MER_ID	Remittance of commercial tenant number opened in clearing bank
		POS_MER_NAME	Special business name
FEE_AMT	Commission fee
		TERM_BATCH_ID	Terminal batch number
CARD_BANK_ID	Unionpay standard card issuing bank number
		RESP_CD	POSP answer code
BANK_RESP_CD	BANK response code
		GATE_ID	Gateway
TRANS_TYPE	Type of transaction
		TRANS_STAT	Status of transaction
ACCT_STAT	Accounting state
		CORRECT_STAT	Towards the right state
TRANS_DATE	Date when the order was generated: sysDate
		PROD_ID	Product number
FEE_FORMULA	Rate formula

Steps S241-S244 are completed, the number of successful matches is 60, and the number of rule confidences is 60.

The confidence coefficient is 60 percent,

storing the current rule and the corresponding confidence coefficient in the memory: frt _ txn _ id1 is trace _ id, and 60% S25 outputs the matching rule and the matching degree result thereof;

rule 1: frt _ txn _ id1 ═ trace _ id, 60%;

rule 2: frt _ req _ id pa _ trans _ id, 40%.

S3 data rule configuration step, selecting the matching rule generated in step S2 as a data matching rule or a custom configuration data matching rule, further comprising:

the data rule configuration includes settings for:

matching data source names: the data source to be matched (e.g., accounting table name) in step S2;

the applicable conditions are as follows: default to null, parsed using the avertor expression evaluation engine (customizable applicability defines the matching data range, e.g., accounting TYPE is posted TRANS _ TYPE ═

‘1000’)；

And (3) associating the table name: matching a target data source (for example, POS data) in the S2 step;

matching conditions are as follows: s2, matching the source field with? "replace (e.g.: trace _ id;

matching condition parameters: matching source fields (e.g., frt _ txn _ id1) in the resulting matching rule of S2;

end mark: the default is matching end (self-defined according to business requirements);

and (4) matching the table names of the results: table names for storing matching results (e.g., checkup success result table TB _ TAS _ ACCT _ CHNL _ MATCHED _ RES);

and (3) setting a check result: dividing the field into a source table field and a target field;

the source table field and the [ RESOURCE ] label head represents that all field values are from the data source to be matched; (e.g., [ RESOURCE ] ACCT _ DATE ═ ACCT _ DATE; ACCT _ SEQ _ ID ═ LOG _ ID … …);

TARGET field, [ TARGET ] tag header represents that all field values thereafter come from the matching TARGET data source (e.g., [ TARGET ] CHNL _ DATE ═ TRANS _ DATE; CHNL _ ID ═ POS'; CHNL _ SEQ _ ID ═ ORD _ ID … …);

packet ID: grouping the matching rules according to the service requirements, and defaulting the grouping to be 1;

matching rule order: according to the rule confidence in S2, the order of confidence is high (e.g., frt _ txn _ id1 is 0 in the trace _ id rule order, and frt _ seq _ id is pa _ trans _ id in the rule order of 1).

And S4, data matching processing, namely, sequentially matching the data according to the selected data matching rule until the data matching is successful or all the rules are traversed.

And screening the association rules for the specified data source end through a rule list stored in a memory, generating an ordered corresponding rule configuration tree according to rule weights, and performing data matching by traversing the rule configuration tree in the rule matching process until the matching is successful or the traversal is completed.

Fig. 4 discloses a flowchart of a method of data matching processing steps according to an embodiment of the present invention, where the data matching processing steps shown in fig. 4 further include:

s41, reading the latest rule information at regular time and storing the latest rule information in a memory;

s42 reading message data matching source information;

s43, searching the relevant matching rule of the data source information in the memory, and generating a configuration tree;

s44 traversing each matching rule in the configuration tree;

judging whether matching data is found;

if the matching data is found, judging whether the matching of the current end mark is ended or not, if not, entering traversal of the next-layer matching tree, and if so, configuring information according to the matching success result to process the matching result;

if no matching data is found, judging whether traversal is finished, if not, traversing the rules in the rest rule configuration tree, and if all the rules are traversed and no matching data is found, processing the matching result according to the matching failure result configuration information.

Further, the rule information further includes:

Next, the data matching process of S4 will be described by taking the data matching of the flight accounting system as an example.

S41 pulls the latest rule information from the database into memory at regular (custom, e.g., every 5 minutes).

The rule information content is shown in table 7 below.

TABLE 7

S42-S43 receives a message from the message middleware, and acquires related check rule information from the memory rule according to the check source table name in the message, and establishes a rule matching tree.

The received message body content is as follows:

{ "VAMS _ TABLE _ NAME" - "TB _ HLACCTTB _ CUST _ ACCT _ LOG", "AC CT _ DATE" - "20210501", "ACCT _ SEQ _ ID" - "39656312", "MER _ ID" - "874068", "CUST _ ID" - "001000030174", "ACCT _ TYPE" - "R", "MER _ USR _ ID" - "874068", "SYS _ TIME" - "000014", "DC _ FLAG" - "C", "TRANS _ AMT" - "1160.00", "FEE _ A MT" - "2.09", "ACCT _ BAL" - "407468.23", "TRANS _ TYPE" - "1001", "ORD _ ID" - "2021050132837020", "FRT _ SEQ _ ID" - "2021050191079658", "REK" - "information" - "1160.00" - };

and obtaining that the data to be matched is 'TB _ HLACCTTB _ CUST _ ACCT _ LOG' through VAMS _ TABLE _ NAME information, and starting to establish a rule configuration tree.

S44 traverses the matching rules in the configuration tree.

All rules are found in memory that match the rule whose source is TB _ HLACCTDB _ CUST _ ACCT _ LOG, which is the first level rule of the configuration tree.

And traversing the first-layer rule source to determine whether a rule with an end mark of 'unfinished' exists, if so, taking the association table name of the rule as a matching source of the second-layer rule tree, recording, and repeating the steps until all rule end marks of the last-layer rule tree are 'finished'.

A schematic diagram of a voyage accounting rules tree is shown in fig. 5.

The configuration tree is traversed in sequence according to the ascending order of the hooking orders (the rule orders are set according to the confidence degrees of the rules, for example, if 90% of the rule orders meet the rule 1, the hooking orders of the rule 1 can be set as the first priority orders '0').

And reading the rule, and inquiring data according to the rule.

For example, the rule 1 information is shown in table 8 below.

TABLE 8

Judging whether the current information meets the applicable conditions:

the applicable conditions are as follows: TB _ HLACCTDB _ list _ ACCT _ LOG $ TRANS _ TYPE ═ 1001'

TRANS _ TYPE:1001, in the current information, is eligible (judged by the aviator expression engine)

Querying according to a rule configuration

And querying the object: TB _ PNRPAY _ TRANS _ LOG

And (3) query conditions: ORD _ ID? AND TRANS _ STAT ═ S'

Inquiring parameters: is there a TB _ HLACCTDB _ list _ ACCT _ LOG ═ FRT _ SEQ _ ID ═ 2021050191079658'

Concatenating query statements for querying

And if the result data is inquired according to the current traversal rule, judging whether the matching is finished or not according to the end mark of the current rule.

And if not, entering the traversal of the next layer of matching tree.

For example: and when the rule 6 is traversed, finding a query result, marking the current end as 'unfinished', and continuing traversing the third-layer rules 7-9.

And if the rule is finished, storing and recording the matched result according to the matched result table name of the current rule and the matched result mapping information.

For example: when traversing rule 1, finding the query result, wherein the current end flag is "ended", according to the configuration of the check result, recording the check result information in the TB _ TAS _ ACCT _ CHNL _ MATCH _ RES table, and setting the fields and the mapping relationship of the corresponding values recorded in the check result information with reference to the values in the check result mapping information.

And if all the rules in the rule configuration tree are traversed and the matching data is not found, entering the processing of checking the failure data.

The flight check-pair failure configuration information is shown in table 9 below.

TABLE 9

According to the check-up failure configuration and the flight check-up failure message, the messages are registered in a TB _ TAS _ ACCT _ UNMATCH _ INFO table, and the field mapping relation in the table is stored according to result KEY mapping and result non-KEY mapping fields.

And when the data matching process is finished, performing subsequent service processing according to the matching result record.

S5 a matching result processing step of processing a matching result, further including:

and configuring an abnormal notification alarm rule according to the specific requirements and the monitoring rule of each service, monitoring an alarm notification, outputting an abnormal detail file and the like.

Such as: in the failure data, some abnormal data is alarmed, and the monitoring configuration information is as follows in table 10:

watch 10

FIG. 6 discloses a schematic diagram of a big data based data matching apparatus according to an embodiment of the present invention. The big data based data matching device may include an internal communication bus 601, a processor (processor)602, a Read Only Memory (ROM)603, a Random Access Memory (RAM)604, a communication port 605, and a hard disk 607. The internal communication bus 601 may enable data communication between components of a big data based data matching device. Processor 602 may make the determination and issue a prompt. In some embodiments, the processor 602 may be comprised of one or more processors.

The communication port 605 may enable data transmission and communication between the big data based data matching apparatus and an external input/output device. In some embodiments, big data based data matching devices may send and receive information and data from a network through communication port 605. In some embodiments, the big data based data matching apparatus may perform data transmission and communication with an external input/output device through the input/output end 606 in a wired manner.

The big data based data matching apparatus may also comprise different forms of program storage units as well as data storage units, such as a hard disk 607, a Read Only Memory (ROM)603 and a Random Access Memory (RAM)604, capable of storing various data files for computer processing and/or communication use, and possibly program instructions for execution by the processor 602. The processor 602 executes these instructions to implement the main parts of the method. The results of the processing by the processor 602 are communicated to an external output device via the communication port 605 and displayed on a user interface of the output device.

For example, the implementation process file of the above big data based data matching method may be a computer program, stored in the hard disk 607, and recorded in the processor 602 for execution, so as to implement the method of the present application.

When the implementation process file of the big data-based data matching method is a computer program, the implementation process file can also be stored in a computer-readable storage medium as an article of manufacture. For example, computer-readable storage media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD)), smart cards, and flash memory devices (e.g., electrically Erasable Programmable Read Only Memory (EPROM), card, stick, key drive). In addition, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media (and/or storage media) capable of storing, containing, and/or carrying code and/or instructions and/or data.

It should be further noted that, in the data matching apparatus, the information is configured by the set rule, and the query statement is spliced, and at present, the splicing logic customized according to the database type can be expanded to a type according to the target query source, so as to make a generalized interface, for example, the interface can be expanded to support most commonly used databases (relational/non-relational), Flink data streams, and the like.

Fig. 7 discloses a schematic diagram of a data matching system based on big data according to an embodiment of the present invention, such as the data matching system based on big data shown in fig. 7, during data matching, a data source synchronization configuration file is generated by configuring a data source to be matched, data is synchronized from a database 704 to a local database 706 through a data synchronization platform 705, and the data to be matched is sent to a cluster of a message middleware 701, the data is primarily matched through a real-time computing platform 703 established based on Flink, after the matching is finished (including matching finished and matching failed data), the data is sent to the message middleware 701 again,

the message middleware 701 receives original data to be matched or primarily matched data as data to be matched and sends the data to the data matching device 702;

the data matching device 702 will further perform matching calculation and processing on the received data.

In case of an abnormal condition, the data is only required to be pushed to the message middleware 701 again, and any data needing to be matched can be processed repeatedly at any time;

The data matching device 702 can be implemented by the embodiment shown in fig. 6.

The data matching method, device and system based on big data provided by the invention have the following beneficial effects:

1) the functional algorithm for automatically analyzing and extracting the rules of the data set is provided, and a rule scheme for matching the data can be provided on the basis of zero knowledge of data service;

2) through the use of the configuration tree, the data association with complex paths is split into simple point-to-point pairwise association modes, and the data matching processing scene of most services can be solved;

3) the configuration data is simply additionally modified, and the data matching requirements are flexibly and quickly met.

While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood by one skilled in the art.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

The embodiments described above are provided to enable persons skilled in the art to make or use the invention and that modifications or variations can be made to the embodiments described above by persons skilled in the art without departing from the inventive concept of the present invention, so that the scope of protection of the present invention is not limited by the embodiments described above but should be accorded the widest scope consistent with the innovative features set forth in the claims.

Claims

1. A data matching method based on big data is characterized by comprising the following steps:

2. The big-data-based data matching method according to claim 1, wherein the S1 data source configuration step further comprises:

3. The big-data-based data matching method according to claim 1, wherein the rule analysis configuration parameters further comprise:

4. The big data based data matching method according to claim 1, wherein the step S2 is performed by automatically obtaining the data to be matched for rule analysis and calculation to generate a plurality of matching rules and matching degree results thereof, and further comprising:

s25 outputs the matching rule and the matching degree result.

5. The big-data-based data matching method according to claim 4, wherein said step S24 further comprises:

6. The big data based data matching method according to claim 5, wherein said step S244 further comprises:

calculating the confidence of the current rule;

7. The big-data-based data matching method according to claim 1, wherein the step S4 further includes:

s41 reading the latest rule information;

s42 reading message data matching source information;

8. The big-data-based data matching method according to claim 7, wherein the rule information further comprises:

the rule applies to the data source;

the data applicable condition in the data source according with the rule;

matching a target source;

matching rules;

whether the matching is finished is marked;

matching result configuration is carried out after matching is successful;

and configuring matching results of matching failure.

9. A big data-based data matching device, comprising:

a memory for storing instructions executable by the processor;

a processor for executing the instructions to implement the method of any one of claims 1-8.

10. A big-data based data matching system, comprising:

a memory for storing instructions executable by the processor;