CN114706899A

CN114706899A - Express delivery data sensitivity calculation method and device, storage medium and equipment

Info

Publication number: CN114706899A
Application number: CN202210080660.XA
Authority: CN
Inventors: 谢少飞; 张鹏飞; 喻波; 王志海; 安鹏; 刘旺
Original assignee: Beijing Wondersoft Technology Co Ltd
Current assignee: Beijing Wondersoft Technology Co Ltd
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-07-05

Abstract

The invention discloses a sensitivity calculation method and device for express delivery data, a storage medium and equipment. Wherein, the method comprises the following steps: obtain express delivery data current number of times of occurrence under a plurality of first dimensions, wherein, above-mentioned express delivery data include: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimension to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; and sequencing the centroids of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number. The invention solves the technical problems of large consumption of manpower and material resources and low checking efficiency caused by the fact that illegal express checking is carried out in a one-by-one express security check mode in the prior art.

Description

Express delivery data sensitivity calculation method and device, storage medium and equipment

Technical Field

The invention relates to the technical field of data processing, in particular to a method, a device, a storage medium and equipment for calculating the sensitivity of express delivery data.

Background

With the rapid development of informatization, the trend of big data calculation is coming all over. The demand of people for online shopping is also increased rapidly, so that illegal criminal activities can be conducted by hiding people through ways of express mails and the like. Therefore, how to quickly identify and acquire suspicious personnel and express information becomes an urgent problem to be solved. According to the prior art, illegal express is mainly checked in an express security check mode, but the method needs to consume a large amount of manpower and material resources, and checking efficiency is low.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a sensitivity calculation method, a sensitivity calculation device, a storage medium and express data, and at least solves the technical problems that in the prior art, illegal express inspection is carried out in a one-by-one express security check mode, a large amount of manpower and material resources are consumed, and the inspection efficiency is low.

According to an aspect of an embodiment of the present invention, a method for calculating sensitivity of express delivery data is provided, including: obtain express delivery data current number of times of occurrence under a plurality of first dimensions, wherein, above-mentioned express delivery data include: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimensionality to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; and sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.

Optionally, the obtaining of the current occurrence frequency of the express delivery data in the plurality of first dimensions includes: respectively counting a first occurrence frequency of the sender data in each first dimension and a second occurrence frequency of the receiver data in each first dimension; integrating the current occurrence frequency according to the first dimension to obtain integrated data, wherein the integrated data comprises: and integrating all the first occurrence frequencies in each first dimension, and integrating all the second occurrence frequencies in each first dimension to obtain the integrated data.

Optionally, the performing KMeans clustering processing on the integrated data to obtain clustered data includes: performing data format processing on the integrated data according to a preset data range to obtain processed integrated data; performing dimensionality reduction on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimension; standardizing the data subjected to the dimensionality reduction to obtain target format data; and performing KMeans clustering processing on the target format data to obtain the clustered data.

Optionally, after the centroids of each type of data in the clustered data are sorted according to a euclidean clustering algorithm and a corresponding sensitivity score value is determined for each phone number, the method further includes: determining a corresponding sensitivity score value according to each phone number to generate a sensitivity score table, wherein the sensitivity score table comprises: a sending sensitivity data table and an receiving sensitivity data table; when receiving a sensitivity retrieval request, determining at least one telephone number carried in the sensitivity retrieval request; retrieving the sensitivity score value corresponding to at least one of the phone numbers from the sensitivity score table.

Optionally, the sorting the centroids of each type of data in the clustered data according to the euclidean clustering algorithm to determine a corresponding sensitivity score value for each phone number includes: sequencing the centroids of each type of data in the clustered data according to a Euclidean clustering algorithm to obtain a first sequencing result; performing secondary KMeans clustering processing on each type of data in the clustered data to obtain secondary clustered data; sequencing the centroids of each class of data in the secondarily clustered data according to a Euclidean clustering algorithm to obtain a second sequencing result; and determining the corresponding sensitivity score value for each phone number according to the basic score determined based on the first sorting result and the second sorting result.

Optionally, before obtaining the current occurrence number of the express delivery data in the plurality of first dimensions, the method further includes: extracting to obtain a plurality of first dimensions by taking an express data table for storing the express data as a data source; wherein the first dimension comprises: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.

According to another aspect of the embodiments of the present invention, there is also provided an express delivery data sensitivity calculation apparatus, including: the first obtaining module is used for obtaining the current occurrence frequency of the express data under a plurality of first dimensions, wherein the express data comprise: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; the second acquisition module is used for integrating the current occurrence frequency according to the first dimensionality to obtain integrated data; the clustering module is used for performing KMeans clustering processing on the integrated data to obtain clustered data; and the determining module is used for sequencing the mass center of each type of data in the clustered data according to a Euclidean clustering algorithm and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.

Optionally, the clustering module includes: the first acquisition submodule is used for carrying out data format processing on the integrated data according to a preset data range to obtain processed integrated data; the second obtaining submodule is used for carrying out dimensionality reduction processing on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimensionality; the third obtaining submodule is used for carrying out standardization processing on the data subjected to dimensionality reduction to obtain target format data; and the first clustering submodule is used for performing the KMeans clustering processing on the target format data to obtain the clustered data.

According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium, where the non-volatile storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing any one of the above methods for calculating sensitivity of express delivery data.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above-mentioned sensitivity calculation methods for express delivery data.

In the embodiment of the present invention, a manner of calculating the sensitivity of express delivery data is adopted, and the current occurrence frequency of the express delivery data in a plurality of first dimensions is obtained, where the express delivery data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimension to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; the centroid of each type of data in the clustered data is sequenced according to a European clustering algorithm, and a corresponding sensitivity score value is determined for each telephone number, wherein the sensitivity score value is used for indicating a sensitivity coefficient of a user of the telephone number, so that the purposes of calculating the sensitivity score value of an express user according to express data and quickly identifying possible illegal expressures according to the sensitivity score value are achieved, the technical effects of improving the efficiency of illegal expressage investigation and reducing the labor cost are achieved, and the technical problems that in the prior art, the illegal expressures are inspected in a mode of one-by-one express security inspection, a large amount of manpower and material resources are consumed, and the investigation efficiency is low are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flowchart of a method for calculating sensitivity of express delivery data according to an embodiment of the present invention;

FIG. 2 is a graphical illustration of an alternative sensitivity scoring result according to an embodiment of the present invention;

FIG. 3 is a flowchart of an alternative express delivery data sensitivity calculation method according to an embodiment of the present invention;

FIG. 4 is a flow diagram of an alternative sensitivity scoring query in accordance with embodiments of the present invention;

FIG. 5 is a flow chart of an alternative express delivery data sensitivity calculation method according to an embodiment of the invention;

FIG. 6 is a flow chart of an alternative express delivery data sensitivity calculation method according to an embodiment of the invention;

fig. 7 is a schematic structural diagram of a sensitivity calculation device for express delivery data according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, in order to facilitate understanding of the embodiments of the present invention, some terms or nouns referred to in the present invention will be explained as follows:

pca (principal components analysis): namely, principal component analysis, also known as principal component analysis. The method aims to convert multiple indexes into a few comprehensive indexes by using the idea of reducing the dimension.

Clustering: the process of dividing a collection of physical or abstract objects into classes composed of similar objects is called clustering. The clusters generated by clustering are a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters. "groups by groups, groups by people" has a large number of classification problems in the natural and social sciences. Clustering analysis, also known as cluster analysis, is a statistical analysis method for studying (sample or index) classification problems. The clustering analysis originates from taxonomy, but clustering is not equal to classification. Clustering differs from classification in that the class into which the clustering is required to be divided is unknown. The clustering analysis content is very rich, and a system clustering method, an ordered sample clustering method, a dynamic clustering method, a fuzzy clustering method, a graph theory clustering method, a clustering forecasting method and the like are adopted.

And (3) European clustering: a clustering algorithm based on Euclidean distance measurement is an important preprocessing method for accelerating the Euclidean clustering algorithm based on a KD-Tree neighbor query algorithm.

Example 1

In accordance with an embodiment of the present invention, there is provided an embodiment of a method for sensitivity calculation of courier data, it is noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that illustrated herein.

Fig. 1 is a flowchart of a sensitivity calculation method for express delivery data according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:

step S102, obtaining the current occurrence frequency of express delivery data under a plurality of first dimensions, wherein the express delivery data comprises: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number;

step S104, integrating the current occurrence frequency according to the first dimensionality to obtain integrated data;

step S106, KMeans clustering processing is carried out on the integrated data to obtain clustered data;

and S108, sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.

In the embodiment of the present invention, a method of calculating the sensitivity of express delivery data is adopted, and the current occurrence times of the express delivery data in a plurality of first dimensions are obtained, where the express delivery data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimension to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; the centroid of each type of data in the clustered data is sequenced according to a European clustering algorithm, and a corresponding sensitivity score value is determined for each telephone number, wherein the sensitivity score value is used for indicating a sensitivity coefficient of a user of the telephone number, so that the purposes of calculating the sensitivity score value of an express user according to express data and quickly identifying possible illegal expressures according to the sensitivity score value are achieved, the technical effects of improving the efficiency of illegal expressage investigation and reducing the labor cost are achieved, and the technical problems that in the prior art, the illegal expressures are inspected in a mode of one-by-one express security inspection, a large amount of manpower and material resources are consumed, and the investigation efficiency is low are solved.

Optionally, the express delivery data table is used as a data source, and a first dimension is extracted; the first dimension includes: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.

Optionally, the current occurrence times of the sending data and the receiving data in a plurality of first dimensions are respectively counted, that is, the first occurrence times of the sending data in each first dimension and the second occurrence times of the receiving data in each first dimension are respectively counted.

Optionally, the telephone number is used as a group to perform integration processing on the first occurrence frequency corresponding to all the sender data in each first dimension, and the second occurrence frequency corresponding to all the receiver data in each first dimension, so as to obtain the integrated data.

Optionally, the centroid is a distance from the clustering center point to the origin; the sensitivity score value includes at least one of: the sensitivity score of a sender and the sensitivity score of an addressee, and fig. 2 shows the sensitivity score of a certain plum, and as shown in fig. 2, the sensitivity score of the certain plum and the sensitivity score of the addressee are both 42.

Optionally, the higher the sensitivity score value is, the higher the possibility that the user corresponding to the sensitivity score value has illegal activities is.

As an alternative embodiment, fig. 3 is a flowchart of an alternative express delivery data sensitivity calculation method according to an embodiment of the present invention, and as shown in fig. 3, the performing kmans clustering processing on the integrated data to obtain clustered data includes:

step S302, performing data format processing on the integrated data according to a preset data range to obtain processed integrated data;

step S304, adopting a dimensionality reduction algorithm to perform dimensionality reduction on the processed integrated data to obtain dimensionality reduced data corresponding to a second dimensionality;

step S306, standardizing the data after dimension reduction to obtain target format data;

step S308, performing the KMeans clustering processing on the target format data to obtain the clustered data.

Optionally, the predetermined data range may be, but is not limited to, express delivery data of residents in a target area, such as XX city resident express delivery data, XX provincial resident express delivery data, XX autonomous region resident express delivery data, and the like.

Optionally, the data format processing is performed on the integrated data according to a predetermined data range, and the obtained processed integrated data is shown in table 1.

TABLE 1

Alternatively, the dimensionality reduction algorithm may be, but is not limited to, a Principal Component Analysis (PCA) algorithm in madlib. It should be noted that the above dimension reduction algorithm can use the least number of dimensions to represent the most meanings, and the low-dimensional calculation variance tends to be more stable. Therefore, the data with low dimensionality is more convenient to calculate, and the meaning of the data with high latitude can also be expressed.

Optionally, a dimensionality reduction algorithm (e.g., a PCA algorithm) is used to perform dimensionality reduction on the processed integrated data to obtain dimensionality reduced data corresponding to the second dimension, where the dimensionality reduced data includes:

s1, performing decorrelation processing on the processed integrated data (8 dimensions), creating an original dense matrix table (jqxx. jqxx _ shpr _ zz) and adding data, where the specific implementation codes are as follows:

drop table if exists jqxx.jqxx_shpr_zz；

create table jqxx.jqxx_shpr_zz(id integer,row_vec DOUBLE PRECISION[])；

insert into jqxx.jqxx_shpr_zz values

(1,'{1,5,2,0,0,0,0,0}'),

(2,'{0,1,0,0,0,0,0,1}'),

(3,'{0,5,0,0,0,2,0,0}'),

(4,'{0,0,1,0,2,0,0,2}'),

(5,'{1,2,0,1,1,0,0,4}'),

(6,'{1,0,0,1,0,0,1,0}'),

(7,'{1,1,0,0,0,0,3,0}')。

s2, calling a PCA training function to train the added data, generating a feature vector matrix, obtaining a training result, and outputting the training result shown in Table 2, wherein the specific implementation codes are as follows:

select madlib.pca_train(

jqxx.jqxx _ shpr _ zz' - -original table

--source table

Resource table shpr zz-, -output table

--output table

'Mobile', -Source Table ID column

Row id of source table-number of principal components).

S3, invoking a PCA projection function to perform projection processing on the training result, and finally obtaining dimension-reduced data corresponding to a second dimension (3 dimensions), where the dimension-reduced data is shown in table 3, and the specific implementation codes are as follows:

Select madlib.pca_project(

'jqxx.jqxx_shpr_zz',

'jqxx.result_table_shpr_zz',

'jqxx.out_table_shpr_zz',

'mobile',

'jqxx.residual_table_shpr_zz',

'jqxx.result_summary_table_shpr_zz')。

TABLE 2

TABLE 3

Row_id	Row_vec
		1	{3.29177676722938,-0.109192661697066,0.65027320246043}
2	{-0.833010395779005,0.0624998438474048,0.496073569262864}
		3	{3.45713701219417,-0.0182366253911953,-0.280213936353739}
4	{-2.21222162912753,-1.16316894886941,1.30735249257714}
		5	{-1.04652026547193,-2.75432751429412,-1.33946918701219}
6	{-1.67962629587755,1.54985327029896,0.118916372191323}
		7	{-0.977535193166934,2.43257263610243,-0.952932513121798}

Optionally, the normalization processing may be, but not limited to, normalization processing, and the data after the dimension reduction is normalized to obtain target format data as shown in table 4.

TABLE 4

Row_id	Row_vec
		1	{0.970832636383322,0.5099644828125,0.248252194389862}
2	{0.243274648969286,0.543065660889226,0.306510608391326}
		3	{1,0.527500204277838,0.599801052399269}
4	{0,0.306764834349678,0}
		5	{0.205614327370166,0,1}
6	{0.0939427838218074,0.829817551869399,0.44900498191862}
		7	{0.217782383171432,1,0.853961951093599}

Optionally, the KMeans clustering processing is performed on the target format data by using a kmeanspp function of madlib to obtain the clustered data, where the target format data may be clustered into 5 classes.

Optionally, the KMeans clustering process is performed on the target format data, for example, the target format data is clustered into 5 classes to obtain the clustered data, that is, a KMeans clustering algorithm is used to select 5 clustering centers, clustering calculation is performed on the dimensionality reduced data, each data is clustered to the closest clustering center of the 5 clustering centers, a coordinate average value of all points in each cluster is calculated, the average value is used as a new clustering center, iteration is repeated 20000 times, and the clustered data is finally obtained, where the specific implementation codes are as follows:

Select madlib.kmeanspp(

'jqxx.t_source_change_nor_cnee_zz',

-the source data table name 'row _ vec',

-the column name 5 containing the data point,

the number of center points 'madlib, squared _ dist _ norm2',

-a distance function 'madlib. avg',

-the aggregation function 20000,

-number of iterations 0.00000001-stop iteration condition).

As an optional embodiment, after sorting the centroids of each type of data in the clustered data according to the euclidean clustering algorithm and determining the corresponding sensitivity score value for each phone number, the method further includes:

step S402, determining a corresponding sensitivity score value according to each phone number to generate a sensitivity score table, wherein the sensitivity score table comprises: a sending sensitivity data table and an receiving sensitivity data table;

step S404, when receiving a sensitivity search request, determining at least one telephone number carried in the sensitivity search request;

step S406, retrieving the sensitivity score value corresponding to at least one of the phone numbers from the sensitivity score table.

Optionally, a sensitivity score table as shown in table 5 is generated by determining a corresponding sensitivity score value according to each of the telephone numbers. After the sensitivity scoring table is calculated, the telephone number is input to carry out sensitivity scoring and detail retrieval, and research and judgment of the sensitivity behavior can be carried out intuitively and quickly.

TABLE 5

Telephone number	Value of credit
		18998765222	44
18998764352	81
		18998765201	44
18998764210	61
		18998764153	43
18998763451	21
		18998764523	61

As an alternative embodiment, fig. 4 is a flowchart of an alternative sensitivity scoring query according to an embodiment of the present invention, as shown in fig. 4, the process includes: inputting a telephone number, and analyzing the request parameter after receiving the telephone number; packaging the analyzed request parameters into sql query statements, respectively querying sensitivity scores and details of sender data and sensitivity scores and details of recipient data, packaging query results, and returning in a json format to obtain returned results; and analyzing the returned result, and displaying the sensitivity score and detail of the sender data and the sensitivity score and detail of the receiver data corresponding to the telephone number in the returned result on the console.

As an optional embodiment, fig. 5 is a flowchart of another optional express delivery data sensitivity calculation method according to an embodiment of the present invention, and as shown in fig. 5, the sorting the centroids of each type of data in the clustered data according to the euclidean clustering algorithm, and determining a corresponding sensitivity score value for each phone number includes:

step S502, sequencing the centroids of each type of data in the clustered data according to a Euclidean clustering algorithm to obtain a first sequencing result;

step S504, each type of data in the clustered data is subjected to secondary KMeans clustering processing to obtain secondary clustered data;

s506, sequencing the centroids of each type of data in the secondarily clustered data according to a Euclidean clustering algorithm to obtain a second sequencing result;

step S508, determining the corresponding sensitivity score value for each phone number according to the base score determined based on the first ranking result and the second ranking result.

Optionally, the centroids (i.e., the distance from each cluster center to the origin) of each type of data in the clustered data are sorted according to a euclidean clustering algorithm to obtain a first sorting result, the basic scores are determined based on the first sorting result, for example, 5 centroids in the clustered data are sorted to obtain the first sorting result, and 5

basic scores

0, 20, 40, 60, and 80 are set based on the first sorting result, and are respectively in one-to-one correspondence with the sorted 5 centroids.

Optionally, performing secondary KMeans clustering processing on each type of data in the clustered data to obtain secondary clustered data, including: normalizing the clustered data, and performing secondary clustering on each class of normalized data, for example, clustering each class of normalized data into 20 classes, and finally obtaining 100 classes of data, which is the secondarily clustered data.

Optionally, the centroid (i.e., the distance from the center of each cluster to the origin) of each of the secondary clustered data (i.e., the 100 classes of data) is sorted according to a euclidean clustering algorithm to obtain a second sorting result, and a corresponding sensitivity score value is determined for each phone number based on the second sorting result and the 5

basic scores

0, 20, 40, 60, and 80, where a value range of the sensitivity score value is 0-100.

As an optional embodiment, fig. 6 is a flowchart of another optional express data sensitivity calculation method according to an embodiment of the present invention, and as shown in fig. 6, an express data table is used as a data source to obtain current occurrence times of the sent data and the received data in multiple dimensions, the current occurrence times are integrated according to the dimensions to obtain integrated data, and the integrated data is formatted to obtain processed integrated data, where the processed integrated data is data in 8 dimensions; performing dimensionality reduction on the processed integrated data through a PCA algorithm, and reducing the processed integrated data from 8 dimensionalities to 3 dimensionalities to obtain dimensionality reduced data; standardizing the data subjected to the dimensionality reduction to obtain target format data; performing KMeans clustering processing on the target format data, and re-clustering the target format data into 5 classes to obtain clustered data; clustering the clustered data again, further clustering each class into 20 classes, and clustering the clustered data again into 100 classes to obtain secondarily clustered data; sequencing the centroids of each class of data in the secondarily clustered data according to a Euclidean clustering algorithm, and determining the corresponding sensitivity score value for each telephone number according to a sequencing result; and generating a sensitivity scoring table according to the sensitivity scoring values, wherein the sensitivity scoring table is used for recording the sensitivity scoring value and the detail corresponding to each telephone number.

The embodiment of the invention can at least realize the following technical effects: aiming at the calculated sensitivity data, a user can directly search the telephone number and intuitively give a sensitivity coefficient for the user to refer to; the information query efficiency of case handling personnel on sensitive personnel can be increased, the dimension range which needs to be referred to when different types of cases are intercepted can be supported, and the application range is wider.

As an optional embodiment, before obtaining the current number of occurrences of the courier data in the plurality of first dimensions, the method further includes:

step S602, extracting to obtain a plurality of first dimensions by taking an express data table for storing the express data as a data source; wherein the first dimension comprises: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.

Optionally, the name ambiguity may include, but is not limited to: mr. Wolff, student, manager, Brother, Dage, sister, Miss.

Optionally, the address obfuscation may include, but is not limited to: crossing, supermarket, shopping mall, square, library, hotel (not including XX room, XX room), etc., or intercepting part of the information in the detailed address, for example, the address length is less than 5 bits.

Optionally, the swap names may include, but are not limited to: the number of times of frequently changing the names of the transmission/reception members in the transmission/reception member data is more than a predetermined number of times (e.g., 2 times).

Optionally, the frequency conversion address may include, but is not limited to: the number of times of frequently changing the address of the transceiver in the transceiver data is more than a predetermined number of times (e.g., 2 times).

Optionally, the areas of major concern may be understood as express delivery data related to areas of major concern at different times.

Optionally, the non-number-home courier receiving and sending may be understood as that the registration place of the telephone number of the receiving or sending is not at the receiving place or the sending place.

Optionally, the above-mentioned important items of interest may include, but are not limited to: detecting and making a special subject for fake wine, paying attention to articles such as wine bottles, wine caps, labels and the like, and the like.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

In this embodiment, a sensitivity calculation device for express delivery data is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and details are not repeated for what has been described. As used hereinafter, the terms "module" and "apparatus" may refer to a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.

According to an embodiment of the present invention, an apparatus embodiment for implementing the express data sensitivity calculation method is further provided, and fig. 7 is a schematic structural diagram of an express data sensitivity calculation apparatus according to an embodiment of the present invention, as shown in fig. 7, the express data sensitivity calculation apparatus includes: a first obtaining module 700, a second obtaining module 702, a clustering module 704, and a determining module 706, wherein:

the first obtaining module 700 is configured to obtain current occurrence times of the express data in a plurality of first dimensions, where the express data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number;

the second obtaining module 702 is configured to perform integration processing on the current occurrence frequency according to the first dimension to obtain integrated data;

the clustering module 704 is configured to perform KMeans clustering on the integrated data to obtain clustered data;

the determining module 706 is configured to sort the centroids of each type of data in the clustered data according to a euclidean clustering algorithm, and determine a corresponding sensitivity score value for each phone number, where the sensitivity score value is used to indicate a sensitivity coefficient of a user of the phone number.

It should be noted that the above modules may be implemented by software or hardware, for example, for the latter, the following may be implemented: the modules can be located in the same processor; alternatively, the modules may be located in different processors in any combination.

It should be noted here that the first obtaining module 700, the second obtaining module 702, the clustering module 704, and the determining module 706 correspond to steps S102 to S108 in embodiment 1, and the modules are the same as the corresponding steps in implementation examples and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above may be implemented in a computer terminal as part of an apparatus.

It should be noted that, reference may be made to the relevant description in embodiment 1 for alternative or preferred embodiments of this embodiment, and details are not described here again.

The sensitivity calculation device for express delivery data may further include a processor and a memory, where the first obtaining module 700, the second obtaining module 702, the clustering module 704, the determining module 706, and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.

In an alternative embodiment, the clustering module 704 includes: the first acquisition submodule is used for carrying out data format processing on the integrated data according to a preset data range to obtain processed integrated data; the second obtaining submodule is used for carrying out dimensionality reduction processing on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimensionality; the third obtaining submodule is used for carrying out standardization processing on the data subjected to dimensionality reduction to obtain target format data; and the first clustering submodule is used for performing the KMeans clustering processing on the target format data to obtain the clustered data.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory, and the kernel can be set to be one or more. The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), including at least one memory chip.

According to an embodiment of the present application, there is also provided an embodiment of a non-volatile storage medium. Optionally, in this embodiment, the nonvolatile storage medium includes a stored program, and the apparatus where the nonvolatile storage medium is located is controlled to execute the sensitivity calculation method for express delivery data when the program runs.

Optionally, in this embodiment, the nonvolatile storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group, and the nonvolatile storage medium includes a stored program.

Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: obtain express delivery data current number of times of occurrence under a plurality of first dimensions, wherein, above-mentioned express delivery data include: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimension to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; and sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.

Optionally, the device in which the nonvolatile storage medium is controlled to execute the following functions during program execution: and respectively counting the first occurrence frequency of the sender data in each first dimension and the second occurrence frequency of the receiver data in each first dimension.

Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: and integrating all the first occurrence frequencies in each first dimension, and integrating all the second occurrence frequencies in each first dimension to obtain the integrated data.

Optionally, the device in which the nonvolatile storage medium is controlled to execute the following functions during program execution: performing data format processing on the integrated data according to a preset data range to obtain processed integrated data; performing dimensionality reduction on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimension; standardizing the data subjected to the dimensionality reduction to obtain target format data; and performing KMeans clustering processing on the target format data to obtain the clustered data.

Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: determining a corresponding sensitivity score value according to each phone number to generate a sensitivity score table, wherein the sensitivity score table comprises: a sending sensitivity data table and an receiving sensitivity data table; when receiving a sensitivity retrieval request, determining at least one telephone number carried in the sensitivity retrieval request; retrieving the sensitivity score value corresponding to at least one of the phone numbers from the sensitivity score table.

Optionally, the device in which the non-volatile storage medium is controlled to execute the following functions when the program runs: sequencing the centroids of each type of data in the clustered data according to a Euclidean clustering algorithm to obtain a first sequencing result; performing secondary KMeans clustering processing on each type of data in the clustered data to obtain secondary clustered data; sequencing the centroids of each class of data in the secondarily clustered data according to a Euclidean clustering algorithm to obtain a second sequencing result; and determining the corresponding sensitivity score value for each phone number according to the basic score determined based on the first sorting result and the second sorting result.

Optionally, the device in which the nonvolatile storage medium is controlled to execute the following functions during program execution: extracting to obtain a plurality of first dimensions by taking an express data table for storing the express data as a data source; wherein the first dimension comprises: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.

According to the embodiment of the application, the embodiment of the processor is also provided. Optionally, in this embodiment, the processor is configured to execute a program, where the program executes the sensitivity calculation method for express delivery data when running.

According to an embodiment of the present application, there is further provided an embodiment of a computer program product, which is adapted to execute a program initializing a sensitivity calculation method step of express delivery data having any one of the above-mentioned.

Optionally, the computer program product is adapted to perform a program for initializing the following method steps when executed on a data processing device: obtain express delivery data current number of occurrences under a plurality of first dimensions, wherein, above-mentioned express delivery data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number; integrating the current occurrence times according to the first dimensionality to obtain integrated data; performing KMeans clustering processing on the integrated data to obtain clustered data; and sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.

According to an embodiment of the present application, there is further provided an embodiment of an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above-mentioned sensitivity calculation methods for express delivery data.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable non-volatile storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a non-volatile storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned nonvolatile storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A sensitivity calculation method for express delivery data is characterized by comprising the following steps:

acquiring the current occurrence times of express delivery data under a plurality of first dimensions, wherein the express delivery data comprise: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number;

integrating the current occurrence times according to the first dimension to obtain integrated data;

performing KMeans clustering processing on the integrated data to obtain clustered data;

and sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm, and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.

2. The method of claim 1,

the obtaining of the current occurrence frequency of the express delivery data under the multiple first dimensions includes: respectively counting the first occurrence times of the sending data under each first dimension and the second occurrence times of the receiving data under each first dimension;

integrating the current occurrence times according to the first dimension to obtain integrated data, wherein the integrated data comprises the following steps:

and integrating all the first occurrence times under each first dimension, and integrating all the second occurrence times under each first dimension to obtain the integrated data.

3. The method of claim 1, wherein the performing kmans clustering on the aggregated data to obtain clustered data comprises:

performing data format processing on the integrated data according to a preset data range to obtain processed integrated data;

performing dimensionality reduction on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimension;

carrying out standardization processing on the data subjected to dimension reduction to obtain target format data;

and performing KMeans clustering processing on the target format data to obtain clustered data.

4. The method of claim 1, wherein after sorting the centroids of each of the clustered data according to a Euclidean clustering algorithm and determining a corresponding sensitivity score value for each of the telephone numbers, the method further comprises:

determining a corresponding sensitivity score value according to each telephone number to generate a sensitivity score table, wherein the sensitivity score table comprises: a sending sensitivity data table and an receiving sensitivity data table;

when a sensitivity retrieval request is received, determining at least one telephone number carried in the sensitivity retrieval request;

retrieving the sensitivity score value corresponding to at least one of the telephone numbers from the sensitivity score table.

5. The method of claim 1, wherein said sorting the centroids of each of said clustered data according to a Euclidean clustering algorithm to determine a corresponding sensitivity score value for each of said phone numbers comprises:

sequencing the mass centers of each type of data in the clustered data according to a Euclidean clustering algorithm to obtain a first sequencing result;

performing secondary KMeans clustering processing on each type of data in the clustered data to obtain secondary clustered data;

sequencing the centroids of each class of data in the secondarily clustered data according to a Euclidean clustering algorithm to obtain a second sequencing result;

and determining the corresponding sensitivity score value for each telephone number according to the basic score determined based on the first sorting result and the second sorting result.

6. The method of any of claims 1-5, wherein prior to obtaining the current number of occurrences of courier data in the first plurality of dimensions, the method further comprises:

extracting to obtain a plurality of first dimensions by taking an express data table for storing the express data as a data source; wherein the first dimension comprises: the method comprises the following steps of name blurring, address blurring, name frequency conversion, address frequency conversion, important attention areas, non-number attribution receiving and dispatching of express, important attention objects and important attention people.

7. A sensitivity calculation device for express delivery data, comprising:

the first obtaining module is configured to obtain current occurrence times of the express delivery data in a plurality of first dimensions, where the express delivery data includes: the express delivery system comprises express delivery data and express receiving data, wherein each express delivery data corresponds to a telephone number;

the second acquisition module is used for integrating the current occurrence times according to the first dimension to obtain integrated data;

the clustering module is used for performing KMeans clustering processing on the integrated data to obtain clustered data;

and the determining module is used for sequencing the mass center of each type of data in the clustered data according to a Euclidean clustering algorithm and determining a corresponding sensitivity score value for each telephone number, wherein the sensitivity score value is used for indicating the sensitivity coefficient of a user of the telephone number.

8. The apparatus of claim 7, wherein the clustering module comprises:

the first acquisition submodule is used for carrying out data format processing on the integrated data according to a preset data range to obtain processed integrated data;

the second obtaining submodule is used for carrying out dimensionality reduction processing on the processed integrated data by adopting a dimensionality reduction algorithm to obtain dimensionality reduced data corresponding to a second dimensionality;

the third obtaining submodule is used for carrying out standardization processing on the data subjected to dimensionality reduction to obtain target format data;

and the first clustering submodule is used for performing KMeans clustering processing on the target format data to obtain clustered data.

9. A non-volatile storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method for sensitivity calculation of courier data according to any of claims 1 to 6.

10. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the sensitivity calculation method for courier data according to any one of claims 1 to 6.