CN114706899A - Express delivery data sensitivity calculation method and device, storage medium and equipment - Google Patents

Express delivery data sensitivity calculation method and device, storage medium and equipment Download PDF

Info

Publication number
CN114706899A
CN114706899A CN202210080660.XA CN202210080660A CN114706899A CN 114706899 A CN114706899 A CN 114706899A CN 202210080660 A CN202210080660 A CN 202210080660A CN 114706899 A CN114706899 A CN 114706899A
Authority
CN
China
Prior art keywords
data
sensitivity
express delivery
express
clustered
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210080660.XA
Other languages
Chinese (zh)
Inventor
谢少飞
张鹏飞
喻波
王志海
安鹏
刘旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wondersoft Technology Co Ltd
Original Assignee
Beijing Wondersoft Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wondersoft Technology Co Ltd filed Critical Beijing Wondersoft Technology Co Ltd
Priority to CN202210080660.XA priority Critical patent/CN114706899A/en
Publication of CN114706899A publication Critical patent/CN114706899A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a sensitivity calculation method and device for express delivery data, a storage medium and equipment. Wherein, the method comprises the following steps: obtain express delivery data current number of times of occurrence under a plurality of first dimensions, wherein, above-mentioned express delivery data include: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimension to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; and sequencing the centroids of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number. The invention solves the technical problems of large consumption of manpower and material resources and low checking efficiency caused by the fact that illegal express checking is carried out in a one-by-one express security check mode in the prior art.

Description

Express delivery data sensitivity calculation method and device, storage medium and equipment
Technical Field
The invention relates to the technical field of data processing, in particular to a method, a device, a storage medium and equipment for calculating the sensitivity of express delivery data.
Background
With the rapid development of informatization, the trend of big data calculation is coming all over. The demand of people for online shopping is also increased rapidly, so that illegal criminal activities can be conducted by hiding people through ways of express mails and the like. Therefore, how to quickly identify and acquire suspicious personnel and express information becomes an urgent problem to be solved. According to the prior art, illegal express is mainly checked in an express security check mode, but the method needs to consume a large amount of manpower and material resources, and checking efficiency is low.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a sensitivity calculation method, a sensitivity calculation device, a storage medium and express data, and at least solves the technical problems that in the prior art, illegal express inspection is carried out in a one-by-one express security check mode, a large amount of manpower and material resources are consumed, and the inspection efficiency is low.
According to an aspect of an embodiment of the present invention, a method for calculating sensitivity of express delivery data is provided, including: obtain express delivery data current number of times of occurrence under a plurality of first dimensions, wherein, above-mentioned express delivery data include: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimensionality to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; and sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.
Optionally, the obtaining of the current occurrence frequency of the express delivery data in the plurality of first dimensions includes: respectively counting a first occurrence frequency of the sender data in each first dimension and a second occurrence frequency of the receiver data in each first dimension; integrating the current occurrence frequency according to the first dimension to obtain integrated data, wherein the integrated data comprises: and integrating all the first occurrence frequencies in each first dimension, and integrating all the second occurrence frequencies in each first dimension to obtain the integrated data.
Optionally, the performing KMeans clustering processing on the integrated data to obtain clustered data includes: performing data format processing on the integrated data according to a preset data range to obtain processed integrated data; performing dimensionality reduction on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimension; standardizing the data subjected to the dimensionality reduction to obtain target format data; and performing KMeans clustering processing on the target format data to obtain the clustered data.
Optionally, after the centroids of each type of data in the clustered data are sorted according to a euclidean clustering algorithm and a corresponding sensitivity score value is determined for each phone number, the method further includes: determining a corresponding sensitivity score value according to each phone number to generate a sensitivity score table, wherein the sensitivity score table comprises: a sending sensitivity data table and an receiving sensitivity data table; when receiving a sensitivity retrieval request, determining at least one telephone number carried in the sensitivity retrieval request; retrieving the sensitivity score value corresponding to at least one of the phone numbers from the sensitivity score table.
Optionally, the sorting the centroids of each type of data in the clustered data according to the euclidean clustering algorithm to determine a corresponding sensitivity score value for each phone number includes: sequencing the centroids of each type of data in the clustered data according to a Euclidean clustering algorithm to obtain a first sequencing result; performing secondary KMeans clustering processing on each type of data in the clustered data to obtain secondary clustered data; sequencing the centroids of each class of data in the secondarily clustered data according to a Euclidean clustering algorithm to obtain a second sequencing result; and determining the corresponding sensitivity score value for each phone number according to the basic score determined based on the first sorting result and the second sorting result.
Optionally, before obtaining the current occurrence number of the express delivery data in the plurality of first dimensions, the method further includes: extracting to obtain a plurality of first dimensions by taking an express data table for storing the express data as a data source; wherein the first dimension comprises: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.
According to another aspect of the embodiments of the present invention, there is also provided an express delivery data sensitivity calculation apparatus, including: the first obtaining module is used for obtaining the current occurrence frequency of the express data under a plurality of first dimensions, wherein the express data comprise: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; the second acquisition module is used for integrating the current occurrence frequency according to the first dimensionality to obtain integrated data; the clustering module is used for performing KMeans clustering processing on the integrated data to obtain clustered data; and the determining module is used for sequencing the mass center of each type of data in the clustered data according to a Euclidean clustering algorithm and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.
Optionally, the clustering module includes: the first acquisition submodule is used for carrying out data format processing on the integrated data according to a preset data range to obtain processed integrated data; the second obtaining submodule is used for carrying out dimensionality reduction processing on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimensionality; the third obtaining submodule is used for carrying out standardization processing on the data subjected to dimensionality reduction to obtain target format data; and the first clustering submodule is used for performing the KMeans clustering processing on the target format data to obtain the clustered data.
According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium, where the non-volatile storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing any one of the above methods for calculating sensitivity of express delivery data.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above-mentioned sensitivity calculation methods for express delivery data.
In the embodiment of the present invention, a manner of calculating the sensitivity of express delivery data is adopted, and the current occurrence frequency of the express delivery data in a plurality of first dimensions is obtained, where the express delivery data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimension to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; the centroid of each type of data in the clustered data is sequenced according to a European clustering algorithm, and a corresponding sensitivity score value is determined for each telephone number, wherein the sensitivity score value is used for indicating a sensitivity coefficient of a user of the telephone number, so that the purposes of calculating the sensitivity score value of an express user according to express data and quickly identifying possible illegal expressures according to the sensitivity score value are achieved, the technical effects of improving the efficiency of illegal expressage investigation and reducing the labor cost are achieved, and the technical problems that in the prior art, the illegal expressures are inspected in a mode of one-by-one express security inspection, a large amount of manpower and material resources are consumed, and the investigation efficiency is low are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flowchart of a method for calculating sensitivity of express delivery data according to an embodiment of the present invention;
FIG. 2 is a graphical illustration of an alternative sensitivity scoring result according to an embodiment of the present invention;
FIG. 3 is a flowchart of an alternative express delivery data sensitivity calculation method according to an embodiment of the present invention;
FIG. 4 is a flow diagram of an alternative sensitivity scoring query in accordance with embodiments of the present invention;
FIG. 5 is a flow chart of an alternative express delivery data sensitivity calculation method according to an embodiment of the invention;
FIG. 6 is a flow chart of an alternative express delivery data sensitivity calculation method according to an embodiment of the invention;
fig. 7 is a schematic structural diagram of a sensitivity calculation device for express delivery data according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, in order to facilitate understanding of the embodiments of the present invention, some terms or nouns referred to in the present invention will be explained as follows:
pca (principal components analysis): namely, principal component analysis, also known as principal component analysis. The method aims to convert multiple indexes into a few comprehensive indexes by using the idea of reducing the dimension.
Clustering: the process of dividing a collection of physical or abstract objects into classes composed of similar objects is called clustering. The clusters generated by clustering are a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters. "groups by groups, groups by people" has a large number of classification problems in the natural and social sciences. Clustering analysis, also known as cluster analysis, is a statistical analysis method for studying (sample or index) classification problems. The clustering analysis originates from taxonomy, but clustering is not equal to classification. Clustering differs from classification in that the class into which the clustering is required to be divided is unknown. The clustering analysis content is very rich, and a system clustering method, an ordered sample clustering method, a dynamic clustering method, a fuzzy clustering method, a graph theory clustering method, a clustering forecasting method and the like are adopted.
And (3) European clustering: a clustering algorithm based on Euclidean distance measurement is an important preprocessing method for accelerating the Euclidean clustering algorithm based on a KD-Tree neighbor query algorithm.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a method for sensitivity calculation of courier data, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that illustrated herein.
Fig. 1 is a flowchart of a sensitivity calculation method for express delivery data according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
step S102, obtaining the current occurrence frequency of express delivery data under a plurality of first dimensions, wherein the express delivery data comprises: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number;
step S104, integrating the current occurrence frequency according to the first dimensionality to obtain integrated data;
step S106, KMeans clustering processing is carried out on the integrated data to obtain clustered data;
and S108, sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.
In the embodiment of the present invention, a method of calculating the sensitivity of express delivery data is adopted, and the current occurrence times of the express delivery data in a plurality of first dimensions are obtained, where the express delivery data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimension to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; the centroid of each type of data in the clustered data is sequenced according to a European clustering algorithm, and a corresponding sensitivity score value is determined for each telephone number, wherein the sensitivity score value is used for indicating a sensitivity coefficient of a user of the telephone number, so that the purposes of calculating the sensitivity score value of an express user according to express data and quickly identifying possible illegal expressures according to the sensitivity score value are achieved, the technical effects of improving the efficiency of illegal expressage investigation and reducing the labor cost are achieved, and the technical problems that in the prior art, the illegal expressures are inspected in a mode of one-by-one express security inspection, a large amount of manpower and material resources are consumed, and the investigation efficiency is low are solved.
Optionally, the express delivery data table is used as a data source, and a first dimension is extracted; the first dimension includes: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.
Optionally, the current occurrence times of the sending data and the receiving data in a plurality of first dimensions are respectively counted, that is, the first occurrence times of the sending data in each first dimension and the second occurrence times of the receiving data in each first dimension are respectively counted.
Optionally, the telephone number is used as a group to perform integration processing on the first occurrence frequency corresponding to all the sender data in each first dimension, and the second occurrence frequency corresponding to all the receiver data in each first dimension, so as to obtain the integrated data.
Optionally, the centroid is a distance from the clustering center point to the origin; the sensitivity score value includes at least one of: the sensitivity score of a sender and the sensitivity score of an addressee, and fig. 2 shows the sensitivity score of a certain plum, and as shown in fig. 2, the sensitivity score of the certain plum and the sensitivity score of the addressee are both 42.
Optionally, the higher the sensitivity score value is, the higher the possibility that the user corresponding to the sensitivity score value has illegal activities is.
As an alternative embodiment, fig. 3 is a flowchart of an alternative express delivery data sensitivity calculation method according to an embodiment of the present invention, and as shown in fig. 3, the performing kmans clustering processing on the integrated data to obtain clustered data includes:
step S302, performing data format processing on the integrated data according to a preset data range to obtain processed integrated data;
step S304, adopting a dimensionality reduction algorithm to perform dimensionality reduction on the processed integrated data to obtain dimensionality reduced data corresponding to a second dimensionality;
step S306, standardizing the data after dimension reduction to obtain target format data;
step S308, performing the KMeans clustering processing on the target format data to obtain the clustered data.
Optionally, the predetermined data range may be, but is not limited to, express delivery data of residents in a target area, such as XX city resident express delivery data, XX provincial resident express delivery data, XX autonomous region resident express delivery data, and the like.
Optionally, the data format processing is performed on the integrated data according to a predetermined data range, and the obtained processed integrated data is shown in table 1.
TABLE 1
Figure RE-GDA0003680999000000061
Figure RE-GDA0003680999000000071
Alternatively, the dimensionality reduction algorithm may be, but is not limited to, a Principal Component Analysis (PCA) algorithm in madlib. It should be noted that the above dimension reduction algorithm can use the least number of dimensions to represent the most meanings, and the low-dimensional calculation variance tends to be more stable. Therefore, the data with low dimensionality is more convenient to calculate, and the meaning of the data with high latitude can also be expressed.
Optionally, a dimensionality reduction algorithm (e.g., a PCA algorithm) is used to perform dimensionality reduction on the processed integrated data to obtain dimensionality reduced data corresponding to the second dimension, where the dimensionality reduced data includes:
s1, performing decorrelation processing on the processed integrated data (8 dimensions), creating an original dense matrix table (jqxx. jqxx _ shpr _ zz) and adding data, where the specific implementation codes are as follows:
drop table if exists jqxx.jqxx_shpr_zz;
create table jqxx.jqxx_shpr_zz(id integer,row_vec DOUBLE PRECISION[]);
insert into jqxx.jqxx_shpr_zz values
(1,'{1,5,2,0,0,0,0,0}'),
(2,'{0,1,0,0,0,0,0,1}'),
(3,'{0,5,0,0,0,2,0,0}'),
(4,'{0,0,1,0,2,0,0,2}'),
(5,'{1,2,0,1,1,0,0,4}'),
(6,'{1,0,0,1,0,0,1,0}'),
(7,'{1,1,0,0,0,0,3,0}')。
s2, calling a PCA training function to train the added data, generating a feature vector matrix, obtaining a training result, and outputting the training result shown in Table 2, wherein the specific implementation codes are as follows:
select madlib.pca_train(
jqxx.jqxx _ shpr _ zz' - -original table
--source table
Resource table shpr zz-, -output table
--output table
'Mobile', -Source Table ID column
Row id of source table-number of principal components).
S3, invoking a PCA projection function to perform projection processing on the training result, and finally obtaining dimension-reduced data corresponding to a second dimension (3 dimensions), where the dimension-reduced data is shown in table 3, and the specific implementation codes are as follows:
Select madlib.pca_project(
'jqxx.jqxx_shpr_zz',
'jqxx.result_table_shpr_zz',
'jqxx.out_table_shpr_zz',
'mobile',
'jqxx.residual_table_shpr_zz',
'jqxx.result_summary_table_shpr_zz')。
TABLE 2
Figure RE-GDA0003680999000000081
TABLE 3
Row_id Row_vec
1 {3.29177676722938,-0.109192661697066,0.65027320246043}
2 {-0.833010395779005,0.0624998438474048,0.496073569262864}
3 {3.45713701219417,-0.0182366253911953,-0.280213936353739}
4 {-2.21222162912753,-1.16316894886941,1.30735249257714}
5 {-1.04652026547193,-2.75432751429412,-1.33946918701219}
6 {-1.67962629587755,1.54985327029896,0.118916372191323}
7 {-0.977535193166934,2.43257263610243,-0.952932513121798}
Optionally, the normalization processing may be, but not limited to, normalization processing, and the data after the dimension reduction is normalized to obtain target format data as shown in table 4.
TABLE 4
Row_id Row_vec
1 {0.970832636383322,0.5099644828125,0.248252194389862}
2 {0.243274648969286,0.543065660889226,0.306510608391326}
3 {1,0.527500204277838,0.599801052399269}
4 {0,0.306764834349678,0}
5 {0.205614327370166,0,1}
6 {0.0939427838218074,0.829817551869399,0.44900498191862}
7 {0.217782383171432,1,0.853961951093599}
Optionally, the KMeans clustering processing is performed on the target format data by using a kmeanspp function of madlib to obtain the clustered data, where the target format data may be clustered into 5 classes.
Optionally, the KMeans clustering process is performed on the target format data, for example, the target format data is clustered into 5 classes to obtain the clustered data, that is, a KMeans clustering algorithm is used to select 5 clustering centers, clustering calculation is performed on the dimensionality reduced data, each data is clustered to the closest clustering center of the 5 clustering centers, a coordinate average value of all points in each cluster is calculated, the average value is used as a new clustering center, iteration is repeated 20000 times, and the clustered data is finally obtained, where the specific implementation codes are as follows:
Select madlib.kmeanspp(
'jqxx.t_source_change_nor_cnee_zz',
-the source data table name 'row _ vec',
-the column name 5 containing the data point,
the number of center points 'madlib, squared _ dist _ norm2',
-a distance function 'madlib. avg',
-the aggregation function 20000,
-number of iterations 0.00000001-stop iteration condition).
As an optional embodiment, after sorting the centroids of each type of data in the clustered data according to the euclidean clustering algorithm and determining the corresponding sensitivity score value for each phone number, the method further includes:
step S402, determining a corresponding sensitivity score value according to each phone number to generate a sensitivity score table, wherein the sensitivity score table comprises: a sending sensitivity data table and an receiving sensitivity data table;
step S404, when receiving a sensitivity search request, determining at least one telephone number carried in the sensitivity search request;
step S406, retrieving the sensitivity score value corresponding to at least one of the phone numbers from the sensitivity score table.
Optionally, a sensitivity score table as shown in table 5 is generated by determining a corresponding sensitivity score value according to each of the telephone numbers. After the sensitivity scoring table is calculated, the telephone number is input to carry out sensitivity scoring and detail retrieval, and research and judgment of the sensitivity behavior can be carried out intuitively and quickly.
TABLE 5
Telephone number Value of credit
18998765222 44
18998764352 81
18998765201 44
18998764210 61
18998764153 43
18998763451 21
18998764523 61
As an alternative embodiment, fig. 4 is a flowchart of an alternative sensitivity scoring query according to an embodiment of the present invention, as shown in fig. 4, the process includes: inputting a telephone number, and analyzing the request parameter after receiving the telephone number; packaging the analyzed request parameters into sql query statements, respectively querying sensitivity scores and details of sender data and sensitivity scores and details of recipient data, packaging query results, and returning in a json format to obtain returned results; and analyzing the returned result, and displaying the sensitivity score and detail of the sender data and the sensitivity score and detail of the receiver data corresponding to the telephone number in the returned result on the console.
As an optional embodiment, fig. 5 is a flowchart of another optional express delivery data sensitivity calculation method according to an embodiment of the present invention, and as shown in fig. 5, the sorting the centroids of each type of data in the clustered data according to the euclidean clustering algorithm, and determining a corresponding sensitivity score value for each phone number includes:
step S502, sequencing the centroids of each type of data in the clustered data according to a Euclidean clustering algorithm to obtain a first sequencing result;
step S504, each type of data in the clustered data is subjected to secondary KMeans clustering processing to obtain secondary clustered data;
s506, sequencing the centroids of each type of data in the secondarily clustered data according to a Euclidean clustering algorithm to obtain a second sequencing result;
step S508, determining the corresponding sensitivity score value for each phone number according to the base score determined based on the first ranking result and the second ranking result.
Optionally, the centroids (i.e., the distance from each cluster center to the origin) of each type of data in the clustered data are sorted according to a euclidean clustering algorithm to obtain a first sorting result, the basic scores are determined based on the first sorting result, for example, 5 centroids in the clustered data are sorted to obtain the first sorting result, and 5 basic scores 0, 20, 40, 60, and 80 are set based on the first sorting result, and are respectively in one-to-one correspondence with the sorted 5 centroids.
Optionally, performing secondary KMeans clustering processing on each type of data in the clustered data to obtain secondary clustered data, including: normalizing the clustered data, and performing secondary clustering on each class of normalized data, for example, clustering each class of normalized data into 20 classes, and finally obtaining 100 classes of data, which is the secondarily clustered data.
Optionally, the centroid (i.e., the distance from the center of each cluster to the origin) of each of the secondary clustered data (i.e., the 100 classes of data) is sorted according to a euclidean clustering algorithm to obtain a second sorting result, and a corresponding sensitivity score value is determined for each phone number based on the second sorting result and the 5 basic scores 0, 20, 40, 60, and 80, where a value range of the sensitivity score value is 0-100.
As an optional embodiment, fig. 6 is a flowchart of another optional express data sensitivity calculation method according to an embodiment of the present invention, and as shown in fig. 6, an express data table is used as a data source to obtain current occurrence times of the sent data and the received data in multiple dimensions, the current occurrence times are integrated according to the dimensions to obtain integrated data, and the integrated data is formatted to obtain processed integrated data, where the processed integrated data is data in 8 dimensions; performing dimensionality reduction on the processed integrated data through a PCA algorithm, and reducing the processed integrated data from 8 dimensionalities to 3 dimensionalities to obtain dimensionality reduced data; standardizing the data subjected to the dimensionality reduction to obtain target format data; performing KMeans clustering processing on the target format data, and re-clustering the target format data into 5 classes to obtain clustered data; clustering the clustered data again, further clustering each class into 20 classes, and clustering the clustered data again into 100 classes to obtain secondarily clustered data; sequencing the centroids of each class of data in the secondarily clustered data according to a Euclidean clustering algorithm, and determining the corresponding sensitivity score value for each telephone number according to a sequencing result; and generating a sensitivity scoring table according to the sensitivity scoring values, wherein the sensitivity scoring table is used for recording the sensitivity scoring value and the detail corresponding to each telephone number.
The embodiment of the invention can at least realize the following technical effects: aiming at the calculated sensitivity data, a user can directly search the telephone number and intuitively give a sensitivity coefficient for the user to refer to; the information query efficiency of case handling personnel on sensitive personnel can be increased, the dimension range which needs to be referred to when different types of cases are intercepted can be supported, and the application range is wider.
As an optional embodiment, before obtaining the current number of occurrences of the courier data in the plurality of first dimensions, the method further includes:
step S602, extracting to obtain a plurality of first dimensions by taking an express data table for storing the express data as a data source; wherein the first dimension comprises: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.
Optionally, the name ambiguity may include, but is not limited to: mr. Wolff, student, manager, Brother, Dage, sister, Miss.
Optionally, the address obfuscation may include, but is not limited to: crossing, supermarket, shopping mall, square, library, hotel (not including XX room, XX room), etc., or intercepting part of the information in the detailed address, for example, the address length is less than 5 bits.
Optionally, the swap names may include, but are not limited to: the number of times of frequently changing the names of the transmission/reception members in the transmission/reception member data is more than a predetermined number of times (e.g., 2 times).
Optionally, the frequency conversion address may include, but is not limited to: the number of times of frequently changing the address of the transceiver in the transceiver data is more than a predetermined number of times (e.g., 2 times).
Optionally, the areas of major concern may be understood as express delivery data related to areas of major concern at different times.
Optionally, the non-number-home courier receiving and sending may be understood as that the registration place of the telephone number of the receiving or sending is not at the receiving place or the sending place.
Optionally, the above-mentioned important items of interest may include, but are not limited to: detecting and making a special subject for fake wine, paying attention to articles such as wine bottles, wine caps, labels and the like, and the like.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
In this embodiment, a sensitivity calculation device for express delivery data is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used hereinafter, the terms "module" and "apparatus" may refer to a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.
According to an embodiment of the present invention, an apparatus embodiment for implementing the express data sensitivity calculation method is further provided, and fig. 7 is a schematic structural diagram of an express data sensitivity calculation apparatus according to an embodiment of the present invention, as shown in fig. 7, the express data sensitivity calculation apparatus includes: a first obtaining module 700, a second obtaining module 702, a clustering module 704, and a determining module 706, wherein:
the first obtaining module 700 is configured to obtain current occurrence times of the express data in a plurality of first dimensions, where the express data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number;
the second obtaining module 702 is configured to perform integration processing on the current occurrence frequency according to the first dimension to obtain integrated data;
the clustering module 704 is configured to perform KMeans clustering on the integrated data to obtain clustered data;
the determining module 706 is configured to sort the centroids of each type of data in the clustered data according to a euclidean clustering algorithm, and determine a corresponding sensitivity score value for each phone number, where the sensitivity score value is used to indicate a sensitivity coefficient of a user of the phone number.
It should be noted that the above modules may be implemented by software or hardware, for example, for the latter, the following may be implemented: the modules can be located in the same processor; alternatively, the modules may be located in different processors in any combination.
It should be noted here that the first obtaining module 700, the second obtaining module 702, the clustering module 704, and the determining module 706 correspond to steps S102 to S108 in embodiment 1, and the modules are the same as the corresponding steps in implementation examples and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above may be implemented in a computer terminal as part of an apparatus.
It should be noted that, reference may be made to the relevant description in embodiment 1 for alternative or preferred embodiments of this embodiment, and details are not described here again.
The sensitivity calculation device for express delivery data may further include a processor and a memory, where the first obtaining module 700, the second obtaining module 702, the clustering module 704, the determining module 706, and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.
In an alternative embodiment, the clustering module 704 includes: the first acquisition submodule is used for carrying out data format processing on the integrated data according to a preset data range to obtain processed integrated data; the second obtaining submodule is used for carrying out dimensionality reduction processing on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimensionality; the third obtaining submodule is used for carrying out standardization processing on the data subjected to dimensionality reduction to obtain target format data; and the first clustering submodule is used for performing the KMeans clustering processing on the target format data to obtain the clustered data.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory, and the kernel can be set to be one or more. The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), including at least one memory chip.
According to an embodiment of the present application, there is also provided an embodiment of a non-volatile storage medium. Optionally, in this embodiment, the nonvolatile storage medium includes a stored program, and the apparatus where the nonvolatile storage medium is located is controlled to execute the sensitivity calculation method for express delivery data when the program runs.
Optionally, in this embodiment, the nonvolatile storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group, and the nonvolatile storage medium includes a stored program.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: obtain express delivery data current number of times of occurrence under a plurality of first dimensions, wherein, above-mentioned express delivery data include: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimension to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; and sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.
Optionally, the device in which the nonvolatile storage medium is controlled to execute the following functions during program execution: and respectively counting the first occurrence frequency of the sender data in each first dimension and the second occurrence frequency of the receiver data in each first dimension.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: and integrating all the first occurrence frequencies in each first dimension, and integrating all the second occurrence frequencies in each first dimension to obtain the integrated data.
Optionally, the device in which the nonvolatile storage medium is controlled to execute the following functions during program execution: performing data format processing on the integrated data according to a preset data range to obtain processed integrated data; performing dimensionality reduction on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimension; standardizing the data subjected to the dimensionality reduction to obtain target format data; and performing KMeans clustering processing on the target format data to obtain the clustered data.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: determining a corresponding sensitivity score value according to each phone number to generate a sensitivity score table, wherein the sensitivity score table comprises: a sending sensitivity data table and an receiving sensitivity data table; when receiving a sensitivity retrieval request, determining at least one telephone number carried in the sensitivity retrieval request; retrieving the sensitivity score value corresponding to at least one of the phone numbers from the sensitivity score table.
Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: sequencing the centroids of each type of data in the clustered data according to a Euclidean clustering algorithm to obtain a first sequencing result; performing secondary KMeans clustering processing on each type of data in the clustered data to obtain secondary clustered data; sequencing the centroids of each class of data in the secondarily clustered data according to a Euclidean clustering algorithm to obtain a second sequencing result; and determining the corresponding sensitivity score value for each phone number according to the basic score determined based on the first sorting result and the second sorting result.
Optionally, the device in which the nonvolatile storage medium is controlled to execute the following functions during program execution: extracting to obtain a plurality of first dimensions by taking an express data table for storing the express data as a data source; wherein the first dimension comprises: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.
According to the embodiment of the application, the embodiment of the processor is also provided. Optionally, in this embodiment, the processor is configured to execute a program, where the program executes the sensitivity calculation method for express delivery data when running.
According to an embodiment of the present application, there is further provided an embodiment of a computer program product, which is adapted to execute a program initializing a sensitivity calculation method step of express delivery data having any one of the above-mentioned.
Optionally, the computer program product is adapted to perform a program for initializing the following method steps when executed on a data processing device: obtain express delivery data current number of occurrences under a plurality of first dimensions, wherein, above-mentioned express delivery data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimensionality to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; and sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.
According to an embodiment of the present application, there is further provided an embodiment of an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above-mentioned sensitivity calculation methods for express delivery data.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable non-volatile storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a non-volatile storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned nonvolatile storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A sensitivity calculation method for express delivery data is characterized by comprising the following steps:
acquiring the current occurrence times of express delivery data under a plurality of first dimensions, wherein the express delivery data comprise: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number;
integrating the current occurrence times according to the first dimension to obtain integrated data;
performing KMeans clustering processing on the integrated data to obtain clustered data;
and sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.
2. The method of claim 1,
the obtaining of the current occurrence frequency of the express delivery data under the multiple first dimensions includes: respectively counting the first occurrence times of the sending data under each first dimension and the second occurrence times of the receiving data under each first dimension;
integrating the current occurrence times according to the first dimension to obtain integrated data, wherein the integrated data comprises the following steps:
and integrating all the first occurrence times under each first dimension, and integrating all the second occurrence times under each first dimension to obtain the integrated data.
3. The method of claim 1, wherein the performing kmans clustering on the aggregated data to obtain clustered data comprises:
performing data format processing on the integrated data according to a preset data range to obtain processed integrated data;
performing dimensionality reduction on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimension;
carrying out standardization processing on the data subjected to dimension reduction to obtain target format data;
and performing KMeans clustering processing on the target format data to obtain clustered data.
4. The method of claim 1, wherein after sorting the centroids of each of the clustered data according to a Euclidean clustering algorithm and determining a corresponding sensitivity score value for each of the telephone numbers, the method further comprises:
determining a corresponding sensitivity score value according to each telephone number to generate a sensitivity score table, wherein the sensitivity score table comprises: a sending sensitivity data table and an receiving sensitivity data table;
when a sensitivity retrieval request is received, determining at least one telephone number carried in the sensitivity retrieval request;
retrieving the sensitivity score value corresponding to at least one of the telephone numbers from the sensitivity score table.
5. The method of claim 1, wherein said sorting the centroids of each of said clustered data according to a Euclidean clustering algorithm to determine a corresponding sensitivity score value for each of said phone numbers comprises:
sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm to obtain a first sequencing result;
performing secondary KMeans clustering processing on each type of data in the clustered data to obtain secondary clustered data;
sequencing the centroids of each class of data in the secondarily clustered data according to a Euclidean clustering algorithm to obtain a second sequencing result;
and determining the corresponding sensitivity score value for each telephone number according to the basic score determined based on the first sorting result and the second sorting result.
6. The method of any of claims 1-5, wherein prior to obtaining the current number of occurrences of courier data in the first plurality of dimensions, the method further comprises:
extracting to obtain a plurality of first dimensions by taking an express data table for storing the express data as a data source; wherein the first dimension comprises: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.
7. A sensitivity calculation device for express delivery data, comprising:
the first obtaining module is configured to obtain current occurrence times of the express delivery data in a plurality of first dimensions, where the express delivery data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number;
the second acquisition module is used for integrating the current occurrence times according to the first dimension to obtain integrated data;
the clustering module is used for performing KMeans clustering processing on the integrated data to obtain clustered data;
and the determining module is used for sequencing the mass center of each type of data in the clustered data according to a Euclidean clustering algorithm and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.
8. The apparatus of claim 7, wherein the clustering module comprises:
the first acquisition submodule is used for carrying out data format processing on the integrated data according to a preset data range to obtain processed integrated data;
the second obtaining submodule is used for carrying out dimensionality reduction processing on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimensionality;
the third obtaining submodule is used for carrying out standardization processing on the data subjected to dimensionality reduction to obtain target format data;
and the first clustering submodule is used for performing KMeans clustering processing on the target format data to obtain clustered data.
9. A non-volatile storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method for sensitivity calculation of courier data according to any of claims 1 to 6.
10. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the sensitivity calculation method for courier data according to any one of claims 1 to 6.
CN202210080660.XA 2022-01-24 2022-01-24 Express delivery data sensitivity calculation method and device, storage medium and equipment Pending CN114706899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210080660.XA CN114706899A (en) 2022-01-24 2022-01-24 Express delivery data sensitivity calculation method and device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210080660.XA CN114706899A (en) 2022-01-24 2022-01-24 Express delivery data sensitivity calculation method and device, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN114706899A true CN114706899A (en) 2022-07-05

Family

ID=82167463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210080660.XA Pending CN114706899A (en) 2022-01-24 2022-01-24 Express delivery data sensitivity calculation method and device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN114706899A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115544247A (en) * 2022-08-17 2022-12-30 国家***邮政业安全中心 Information processing method, information processing device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115544247A (en) * 2022-08-17 2022-12-30 国家***邮政业安全中心 Information processing method, information processing device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108763952B (en) Data classification method and device and electronic equipment
CN107844565B (en) Commodity searching method and device
CN104067567B (en) System and method for carrying out spam detection using character histogram
CN108491388B (en) Data set acquisition method, classification method, device, equipment and storage medium
CN112528025A (en) Text clustering method, device and equipment based on density and storage medium
CN105574089B (en) Knowledge graph generation method and device, and object comparison method and device
US20220012231A1 (en) Automatic content-based append detection
CN105404627B (en) It is a kind of for determining the method and apparatus of search result
CN109918678B (en) Method and device for identifying field meaning
CN108961019B (en) User account detection method and device
CN107622326A (en) User's classification, available resources Forecasting Methodology, device and equipment
CN110363206B (en) Clustering of data objects, data processing and data identification method
CN106933878B (en) Information processing method and device
CN112258254A (en) Internet advertisement risk monitoring method and system based on big data architecture
CN106157215B (en) Information processing method and device
CN115062186A (en) Video content retrieval method, device, equipment and storage medium
CN110083731B (en) Image retrieval method, device, computer equipment and storage medium
CN114706899A (en) Express delivery data sensitivity calculation method and device, storage medium and equipment
CN105159898A (en) Searching method and searching device
CN117150138B (en) Scientific and technological resource organization method and system based on high-dimensional space mapping
CN112069269B (en) Big data and multidimensional feature-based data tracing method and big data cloud server
CN113569933A (en) Trademark pattern matching method and corresponding device, equipment and medium
CN113327132A (en) Multimedia recommendation method, device, equipment and storage medium
CN114817518B (en) License handling method, system and medium based on big data archive identification
US9391936B2 (en) System and method for spam filtering using insignificant shingles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination