CN115329833A

CN115329833A - Logistics system abnormal data identification method based on block chain

Info

Publication number: CN115329833A
Application number: CN202210760438.4A
Authority: CN
Inventors: 周媛媛; 李晓辉; 沈八中; 苏家楠; 吕思婷
Original assignee: Guangzhou Institute of Technology of Xidian University
Current assignee: Guangzhou Institute of Technology of Xidian University
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-11-11

Abstract

The invention discloses a logistics system abnormal data identification method based on a block chain, and relates to the technical field of block chains. Initializing sample data to obtain a plurality of training sets and an initialized sample weight and a misclassification cost corresponding to each training set; performing a preset number of iterations by taking the weight of the initialized sample and the misclassification cost as parameters to obtain a preset number of weak classifiers, and combining the preset number of weak classifiers into a strong classifier of the training set; carrying out weighted aggregation on the strong classifiers of all the training sets, and obtaining a judgment result whether the transaction data is abnormal or not through voting type judgment; and packing the normal data to obtain a second block and accessing the second block to the block chain. The method has the advantages that the unbalanced data classification algorithm based on integrated learning and cost sensitivity is combined, the accuracy of unbalanced data classification can be effectively improved, when the proportion of abnormal data in the whole data is small, accurate judgment can be carried out through the trained strong classifier, and the phenomenon that the abnormal data is packed and linked up to cause loss is avoided.

Description

Logistics system abnormal data identification method based on block chain

Technical Field

The invention relates to the technical field of block chains, in particular to a logistics system abnormal data identification method based on a block chain.

Background

In recent years, electronic commerce has been rapidly developed, and the combination of logistics and various internet technologies has become more and more compact. While the logistics related technology is rapidly improved, many problems still need to be solved, such as logistics information tracing problem. The problem is not only that the transportation information such as the origin, the route and the destination of the goods is simply inquired, but also that the authenticity of the inquired information is ensured. The logistics commodity can be effectively tracked by tracing the logistics information, and the logistics commodity can be known and processed in time when passing through risk areas and related personnel, so that the risk is reduced to a certain extent.

The traditional logistics system adopts a centralized deployment mode, so that the risk of tampering logistics information by manufacturers and distributors exists, and the accuracy of the information cannot be ensured. Meanwhile, blockchain networks have gained widespread attention due to their highly transparent, decentralized, non-tamper-evident, and anonymous nature. Compared with the traditional logistics system, the block chain realizes the functions of data non-tampering, logistics commodity tracing and the like, effectively breaks an information island, and avoids the problem caused by malicious information modification.

In the prior art, although the data of the block chain technology is used to ensure that the uplink data cannot be tampered, the authenticity and accuracy of the data submitted by the participants cannot be ensured. Under the condition that a malicious party submits malicious data or abnormal data, data chaining is directly carried out without judgment, and loss is easily caused.

Disclosure of Invention

The present invention is directed to solve the problems of the background art, and provides a method for identifying abnormal data of a logistics system based on a block chain.

The purpose of the invention can be realized by the following technical scheme:

the logistics system abnormal data identification method based on the block chain provided by the embodiment of the invention comprises the following steps:

acquiring transaction data submitted to a to-be-checked block by a user block link node;

initializing sample data to obtain a plurality of training sets and an initialized sample weight and a misclassification cost corresponding to each training set; the sample data comprises the transaction data and historical data of the user blockchain node;

aiming at each training set, carrying out iteration for a preset number of times by taking the weight of the initialized sample and the misclassification cost as parameters to obtain a preset number of weak classifiers, and combining the preset number of weak classifiers into a strong classifier of the training set;

carrying out weighted aggregation on the strong classifiers of all the training sets, and obtaining a judgment result whether the transaction data is abnormal or not through voting type judgment;

and broadcasting the judgment result of the transaction data on a block chain, packaging normal data to obtain a second block, and accessing the second block to the block chain.

Optionally, before obtaining the transaction data submitted to the block to be reviewed by the user block link node, the method further includes:

acquiring an account and a password input by a target User through a User block link point, comparing the account and the password with information filled in a database User table during registration, and verifying the identity of the target User;

and if the target user identity is successfully verified and the transaction data is uploaded by the user block link node, storing the transaction data to the block to be audited.

Optionally, initializing the sample data to obtain a plurality of training sets and an initialized sample weight and a misclassification cost corresponding to each training set, including:

dividing the sample data into N training sets;

for each training set, the training set is represented as:

S＝{(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),...,(x _K ,y _K )|y∈(1,-1)}

where K is the total number of samples in the training set S, x _k Data representing the kth sample, y _k Indicates whether the k sample is normal, y _k =1 represents a normal class sample; y is _k =1 represents exception class sample;

calculating the initial sample weight D of the first iteration of the training set ₁ ：

Wherein, w _1k The weight of each sample in the training set S in the first iteration is obtained;

calculating the misclassification cost C of the training set _k ：

Wherein n is the number of most samples in the training set, m is the number of few samples in the training set, and K is the total number of samples in the training set S.

Optionally, for each training set, performing a preset number of iterations with the initialized sample weight and the misclassification cost as parameters to obtain a preset number of weak classifiers, and combining the preset number of weak classifiers into a strong classifier of the training set, including the following steps:

extracting partial data as a learning training set aiming at each training set;

step two, learning the learning training set by using the learning sample weight corresponding to the learning training set at present to obtain a weak classifier; the learning sample weight is the initialization sample weight at a first iteration;

step three, updating the weight of the learning sample of the next iteration according to the weight of the learning sample and the misclassification cost;

step four, the updated learning sample weight is used, the steps one to three are repeatedly executed until a preset number of iterations are completed, and a preset number of weak classifiers are obtained;

step five, obtaining a group of weak classifiers f = (f) after T iterations of the training set S ₁ ,f ₂ ,...,f _T ) Combining the weak classifier set F into a strong classifier F _i ：

Wherein i represents the ith of the N training sets S, T represents the iteration number of the training set S, and alpha _t Denotes the weak classifier f at the t-th iteration _t The sign function output value is 1 or-1.

Optionally, each training set includes normal class samples and abnormal class samples, where the normal class samples are more than the abnormal class samples, the normal class samples are called majority class samples, and the abnormal class samples are called minority class samples; assuming that each training set comprises m majority class samples and n minority class samples;

for each training set, extracting partial data to serve as a learning training set, and specifically comprising the following steps:

and aiming at each training set, sequencing m majority samples contained in the training set from large to small according to the weight of the initialized samples, and extracting the first n majority samples and n minority samples to form a new set as a learning training set.

Optionally, updating the learning sample weight of the next iteration according to the learning sample weight and the misclassification cost includes:

supposing that the iteration is the t-th iteration, calculating the weak classifier f in the iteration _t Error rate of (e) _t ：

Wherein D is _t (x _k ) Data x representing the k sample _k The learning sample weight at the tth iteration;

weak classifier f for computing t-th iterative training _t Weight of alpha _t ：

Calculating a sample weight adjustment factor beta _k ：

β _k ＝-0.5(y _k f _t (x _k ))C _k +0.5

Wherein, y _k Is a variable having a value of 1 or-1, f _t (x _k ) A weak classifier f which is used for carrying out t-time iterative training on the data of the kth sample _t Output value of (C) _k Is a wrong division cost;

updating the weight of the learning sample of t +1 iterations to obtain D _t+1 ：

Wherein D is _t (x _k ) Initialization weight, α, for the kth sample _t Weak classifier f for the t-th iteration _t Weight of (1), beta _k For the weight adjustment factor of the kth sample, z _t Is a normalization factor.

Optionally, it is assumed that N training sets are included, and the training set S is any one of the N training sets; strong classifier F for training set S _i A predicted value given to the input data is 1 or-1;

performing weighted aggregation on the strong classifiers of all the training sets, and obtaining a judgment result whether the transaction data is abnormal through voting type judgment, wherein the judgment result comprises the following steps:

from strong classifiers of N training setsWeighting and aggregating to obtain a group of strong classifiers F = { F = } ₁ ,F ₂ ,...,F _N }；

And (3) taking the prediction value of the strong classifier F as the input of a voting judgment function, and calculating an output prediction value P:

wherein N is the number of training sets, F _i (x) Strong classifier F for representing ith data set training _i For the predicted value of the input data, the output value of the sign function is 1 or-1;

obtaining a judgment result whether the transaction data is abnormal or not according to the predicted value P:

if the predicted value P is not less than 0, the transaction data is abnormal data;

and if the predicted value P is less than 0, the transaction data are normal data.

The logistics system abnormal data identification method based on the block chain obtains transaction data submitted to a block to be checked by a user block chain node; initializing sample data to obtain a plurality of training sets and an initialized sample weight and a misclassification cost corresponding to each training set; the sample data comprises transaction data and historical data of user block chain nodes; aiming at each training set, carrying out iteration for a preset number of times by taking the weight of the initialized sample and the misclassification cost as parameters to obtain a preset number of weak classifiers, and combining the preset number of weak classifiers into a strong classifier of the training set; carrying out weighted aggregation on the strong classifiers of all the training sets, and obtaining a judgment result whether the transaction data is abnormal or not through voting type judgment; and broadcasting the judgment result of the transaction data on the block chain, packaging the normal data to obtain a second block, and accessing the second block to the block chain. The method combines an unbalanced data classification algorithm based on ensemble learning and cost sensitivity, the ensemble learning forms a strong classifier by training a weak classifier, the cost sensitivity algorithm attaches importance to abnormal sample data by giving different weights to the sample, the accuracy of unbalanced data classification can be effectively improved by combining the ensemble learning and the cost sensitivity, when the proportion of the abnormal data in the whole data is small, accurate judgment can be carried out through the trained strong classifier, and the loss caused by packaging and chaining of the abnormal data is avoided.

Drawings

The invention will be further described with reference to the accompanying drawings.

Fig. 1 is a flowchart of a method for identifying abnormal data of a logistics system based on a block chain according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a logistics system provided by an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a logistics system abnormal data identification method based on a block chain. Referring to fig. 1, fig. 1 is a flowchart of a logistics system abnormal data identification method based on a block chain according to an embodiment of the present invention. The method may comprise the steps of:

s101, acquiring transaction data submitted to a to-be-audited block by a user block chain node.

S102, initializing the sample data to obtain a plurality of training sets and the initialized sample weight and the misclassification cost corresponding to each training set.

S103, aiming at each training set, carrying out iteration for a preset number of times by taking the weight of the initialized sample and the misclassification cost as parameters to obtain a preset number of weak classifiers, and combining the preset number of weak classifiers into a strong classifier of the training set.

And S104, performing weighted aggregation on the strong classifiers of all the training sets, and obtaining a judgment result whether the transaction data is abnormal or not through voting type judgment.

And S105, broadcasting the judgment result of the transaction data on the block chain, packaging the normal data to obtain a second block, and accessing the second block to the block chain.

The sample data includes transaction data and historical data of user blockchain nodes.

According to the logistics system abnormal data identification method based on the block chain, the unbalanced data classification algorithm based on the ensemble learning and the cost sensitivity is combined, the ensemble learning forms the strong classifier by training the weak classifier, the cost sensitivity algorithm attaches importance to abnormal sample data by giving different weights to the samples, the accuracy of unbalanced data classification can be effectively improved by combining the imbalance data classification algorithm with the cost sensitivity algorithm, when the proportion of abnormal data in the whole data is small, accurate judgment can be carried out through the trained strong classifier, and the phenomenon that the abnormal data are packed and linked up to avoid loss is caused.

In one implementation, the user blockchain nodes are electronic terminals such as computers and cell phones connected to a blockchain network. The user block chain node can pack the transaction data and submit the transaction data to the block to be checked.

In one implementation, normal data is recorded in a blockchain in a transaction form through the intelligent contract, when new data needs to be written, the system identifies the last transaction hash stored last time and writes the data into the block, and the transaction data in the intelligent contract can comprise a transaction address, transaction content, a transaction address, a transaction date, a transaction person signature and the like.

In one embodiment, before S101, the method may further include:

and acquiring an account and a password input by the target User through the User block link point, and comparing the account and the password with the information recorded in the User table of the database during registration to verify the identity of the target User.

And if the target user identity is successfully verified and the transaction data are uploaded by the user block link points, storing the transaction data to the block to be checked.

In one implementation, the target User verifies identity information, the Account type may include a producer, a dealer, a transportation person, and the like, the background automatically identifies the Account and Password information input by the User, and compares the Account and Password information with the information filled in during registration recorded in the User table of the database User, and the User is a User information type and includes attributes such as the User type (whether the User is an administrator), the Account, and the Password. After the target user verifies the identity, related data of the logistics commodity can be filled in, wherein the related data comprises a production environment, transfer time, a transportation environment, distribution personnel and the like.

In one embodiment, step S102 includes:

step 1, dividing sample data into N training sets;

for each training set, the training set is represented as:

S＝{(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),...,(x _K ,y _K )|y∈(1,-1)}

where K is the total number of samples in the training set S, x _k Data representing the k-th sample, y _k Indicates whether the k sample is normal, y _k =1 represents a normal class sample; y is _k =1 represents an exception class sample.

Step 2, calculating the weight D of the initialized sample of the first iteration of the training set ₁ ：

Wherein, w _1k The weight of each sample in the training set S for the first iteration.

Step 3, calculating the misclassification cost C of the training set _k ：

In one embodiment, step S103 may include the steps of:

step one, extracting partial data as a learning training set aiming at each training set.

And step two, learning the learning training set by using the learning sample weight corresponding to the learning training set at present to obtain the weak classifier.

And step three, updating the weight of the learning sample of the next iteration according to the weight of the learning sample and the misclassification cost.

And step four, using the updated learning sample weight, and repeatedly executing the steps from the first step to the third step until the preset number of iterations is completed to obtain the preset number of weak classifiers.

Learning the sample weight as an initialization sample weight in the first iteration;

in one embodiment, each training set comprises normal class samples and abnormal class samples, wherein the normal class samples are more than the abnormal class samples, the normal class samples are called majority class samples, and the abnormal class samples are called minority class samples; assuming that each training set comprises m majority class samples and n minority class samples;

and aiming at each training set, sequencing m majority samples contained in the training set from large to small according to the weight of the initialized sample, and extracting the first n majority samples and n minority samples to form a new set as a learning training set.

In one embodiment, updating the learning sample weight of the next iteration according to the learning sample weight and the misclassification cost comprises:

step 1, supposing the iteration is the t-th iteration, calculating the weak classifier f in the iteration _t Error rate of (e) _t ：

Wherein D is _t (x _k ) Data x representing the k sample _k Learning sample weights at the t-th iteration;

step 2, calculating the weak classifier f of the t-th iterative training _t Weight of alpha _t ：

Step 3, calculating a sample weight adjustment factor beta _k ：

β _k ＝-0.5(y _k f _t (x _k ))C _k +0.5

Wherein, y _k Is a variable having a value of 1 or-1, f _t (x _k ) Weak classifier f for t-time iterative training of data of kth sample _t Output value of C _k Is a wrong division cost;

step 4, updating the weight of the learning sample of the t +1 iterations to obtain D _t+1 ：

Wherein D is _t (x _k ) Initialization weight, α, for the kth sample _t Weak classifier f for the t-th iteration _t Weight of (b), beta _k Weight adjustment factor for the kth sample, z _t Is a normalization factor.

In one implementationAssuming that N training sets are included, and the training set S is any one of the N training sets; strong classifier F for training set S _i A predicted value given to the input data is 1 or-1;

step S104 may include the steps of:

step 1, performing weighted aggregation on strong classifiers of N training sets to obtain a group of strong classifiers F = { F = { (F) } ₁ ,F ₂ ,...,F _N }。

Step 2, the prediction value of the strong classifier F is used as the input of the voting decision function, and the output prediction value P is calculated:

where N is the number of training sets, F _i (x) Strong classifier F for representing ith data set training _i For the predicted value of the input data, the output value of the sign function is 1 or-1.

Step 3, obtaining a judgment result whether the transaction data is abnormal according to the predicted value P:

if the predicted value P is larger than or equal to 0, the transaction data are abnormal data;

In one implementation, if the predicted value P is greater than or equal to 0, it is proved that half or more of the data in the strong classifier are judged to be in minority, that is, abnormal data, and the system screens the abnormal data; otherwise, if the predicted value P is less than 0, it is proved that the data is judged to be a few if less than half of the strong classifiers, and the data is the normal data, and the system reserves the data and performs uplink operation. And only when more than half of the strong classifiers consider the submitted data as normal data, the data chaining operation is performed, so that the accuracy of the judgment result is further improved, and the data stored on the basis of the block chain is more real and reliable.

In one implementation, referring to fig. 2, fig. 2 is a schematic diagram of a logistics system provided in an embodiment of the present invention, where the logistics system is built based on a Hyperhedger Fabric architecture by using the abnormal data identification method provided in the embodiment of the present invention, and performs voting decision on a strong classifier trained based on ensemble learning and a cost-sensitive unbalanced data classification algorithm for uplink data, so as to ensure that data on the logistics system is real and reliable.

The overall architecture of the logistics system comprises a two-layer system support and a three-layer core architecture. The two-layer system comprises an operation and maintenance monitoring service system and a safety management service system. The operation and maintenance monitoring service system comprises contract management, notification management, log management, anomaly monitoring, data analysis and other modules, and is responsible for collecting and visually presenting running state data in the system, wherein the state data in the system comprises the access amount, time consumption, node health state and bottom layer machine resource use condition of the system, the state of the whole block chain system is mastered in real time through visual monitoring, and related personnel are timely notified to process when conditions such as fraudulent nodes, account book tampering, machine faults, data anomalies and the like occur. The safety management service system provides safety protection technologies such as cross-link safety, intelligent contract safety, privacy protection and the like, and has safety protection measures such as identity authentication management, API (application programming interface) safety, business safety and the like of users and services.

The three-layer core architecture in the blockchain platform comprises:

1) And the infrastructure layer is used for providing bottom layer resources for the block chain network and providing a relevant calculation and storage basis for processing information such as production, circulation, distribution and the like of the logistics commodities.

Specifically, the infrastructure layer includes computing node resources, storage resources, network communication bandwidth resources, and the like, for data computation and storage in the network.

2) And the block chain core layer comprises an encryption algorithm module, a consensus algorithm module and a user and authority management module. The system comprises an encryption algorithm module, a consensus algorithm module and a user and authority management module, wherein the encryption algorithm module and the consensus algorithm module are used for encrypting and performing consensus processing on information of production, circulation, distribution and the like of logistics commodities, and the user and authority management module is used for managing access authority of the mobile terminal.

3) And the scene application layer is used for constructing a credible evidence storage related block chain application scene according to the tracing request.

In a specific implementation process, the mobile terminal is used for sending an information traceability request to the blockchain platform by taking the logistics commodity identification as an index value, and receiving query result information matched with the information traceability request after the blockchain platform verifies that the authority of the mobile terminal is qualified.

While one embodiment of the present invention has been described in detail, the description is only a preferred embodiment of the present invention and should not be taken as limiting the scope of the invention. All equivalent changes and modifications made within the scope of the present invention shall fall within the scope of the present invention.

Claims

1. A logistics system abnormal data identification method based on a block chain is characterized by comprising the following steps:

acquiring transaction data submitted to a to-be-audited block by a user block chain node;

initializing sample data to obtain a plurality of training sets and an initialized sample weight and a misclassification cost corresponding to each training set; the sample data comprises the transaction data and historical data of the user block chain node;

2. The method for identifying abnormal data of a logistics system based on a block chain as claimed in claim 1, wherein before acquiring transaction data submitted to a block to be checked by a user block chain node, the method further comprises:

acquiring an account and a password input by a target User through a User block chain node, comparing the account and the password with information filled during registration recorded in a User table of a database User, and performing identity verification on the target User;

and if the target user identity verification is successful and the user block chain node uploads the transaction data, storing the transaction data to the block to be audited.

3. The method for identifying abnormal data of a logistics system based on a block chain according to claim 1, wherein the initialization of the sample data is performed to obtain a plurality of training sets and an initialization sample weight and a misclassification cost corresponding to each training set, and the method comprises the following steps:

dividing the sample data into N training sets;

for each training set, the training set is represented as:

S＝{(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),...,(x _K ,y _K )|y∈(1,-1)}

wherein K is the total number of samples of the training set S, x _k Data representing the kth sample, y _k Indicates whether the k sample is normal, y _k =1 represents a normal class sample; y is _k =1 represents an exception class sample;

calculating the misclassification cost C of the training set _k ：

4. The method for identifying abnormal data of a logistics system based on a block chain as claimed in claim 1, wherein for each training set, a preset number of iterations are performed with the initialized sample weight and the misclassification cost as parameters to obtain a preset number of weak classifiers, and the preset number of weak classifiers are combined into the strong classifier of the training set, comprising the following steps:

extracting partial data as a learning training set aiming at each training set;

learning the learning training set by using the learning sample weight corresponding to the learning training set at present to obtain a weak classifier; the learning sample weight is the initialization sample weight at a first iteration;

updating the weight of the learning sample of the next iteration according to the weight of the learning sample and the misclassification cost;

Wherein i represents the ith of the N training sets S, T represents the iteration number of the training set S, and alpha _t Represents the weak classifier f at the t-th iteration _t The sign function output value is 1 or-1.

5. The method for identifying the abnormal data of the logistics system based on the block chain as claimed in claim 4, wherein each training set comprises normal samples and abnormal samples, the number of the normal samples is more than that of the abnormal samples, the normal samples are called as majority samples, and the abnormal samples are called as minority samples; assuming that each training set comprises m majority class samples and n minority class samples;

extracting partial data for each training set to serve as a learning training set, and specifically comprising the following steps of:

6. The method for identifying abnormal data of a logistics system based on a block chain as claimed in claim 4, wherein updating the weight of the learning sample in the next iteration according to the weight of the learning sample and the misclassification cost comprises:

weak classifier f for calculating t-th iterative training _t Weight of alpha _t ：

Calculating a sample weight adjustment factor beta _k ：

β _k ＝-0.5(y _k f _t (x _k ))C _k +0.5

Wherein D is _t (x _k ) For the initialization weight of the kth sample, α _t Weak classifier f for the t-th iteration _t Weight of (b), beta _k For the weight adjustment factor of the kth sample, z _t Is a normalization factor.

7. The method for identifying the abnormal data of the logistics system based on the block chain as claimed in claim 1, wherein N training sets are assumed to be included, and the training set S is any one of the N training sets; strong classifier F for training set S _i A predicted value given to the input data is 1 or-1;

carrying out weighted aggregation on the strong classifiers of the N training sets to obtain a group of strong classifiers F = { F = { (F) } ₁ ,F ₂ ,...,F _N }；

obtaining a judgment result whether the transaction data is abnormal according to the predicted value P: