CN115564156A - Transaction aggregation occurrence early warning method based on machine learning and application thereof - Google Patents

Transaction aggregation occurrence early warning method based on machine learning and application thereof Download PDF

Info

Publication number
CN115564156A
CN115564156A CN202211294234.2A CN202211294234A CN115564156A CN 115564156 A CN115564156 A CN 115564156A CN 202211294234 A CN202211294234 A CN 202211294234A CN 115564156 A CN115564156 A CN 115564156A
Authority
CN
China
Prior art keywords
data
transaction
text
cluster
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211294234.2A
Other languages
Chinese (zh)
Inventor
朱淑敏
黄宸
曹鹏寅
李斌
田雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCI China Co Ltd
Original Assignee
CCI China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCI China Co Ltd filed Critical CCI China Co Ltd
Priority to CN202211294234.2A priority Critical patent/CN115564156A/en
Publication of CN115564156A publication Critical patent/CN115564156A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Technology Law (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a transaction aggregation occurrence early warning method based on machine learning and application thereof, and the method comprises the following steps: extracting a characteristic structure of original data, wherein the characteristic structure at least comprises transaction text content; extracting text contents in the transaction text contents, and dividing the text contents into address field characteristics, name field characteristics, transaction number field characteristics and original text field characteristics; converting the address field characteristics into standard structured addresses and longitude and latitude data thereof; selecting any longitude and latitude data as a data group point through a recursive density clustering algorithm, finding all data object points with the data group point density reaching to form a cluster, and classifying the data groups in the same cluster into the same type of data; processing text fields under the same type of data through a text similarity algorithm, and storing similar transactions under the same type of data into a result table as results; and S50, outputting a result table and early warning according to the result table. The method and the device have the advantages of high availability and accuracy of event extraction.

Description

Transaction aggregation occurrence early warning method based on machine learning and application thereof
Technical Field
The application relates to the technical field of machine learning, in particular to a transaction aggregation occurrence early warning method based on machine learning and application thereof.
Background
The current affair aggregation occurrence early warning can be used for analyzing and diversely resolving the types and actual conditions of dispute events in various places such as land acquisition removal, arrangement dispute, real estate dispute, intellectual property dispute, campus neighborhood dispute, ecological environment dispute, marital family dispute, credit dispute, business dispute, mountain and forest land dispute.
However, in the prior art, various similarity algorithms of Chinese NLP are used for analyzing in a specific field, and the accuracy, the scene change and the classification capability of the method can not meet the requirements of some refined scenes. Also, there are subject-assisted classifications such as name, enterprise name, address name, and the like, and still limited problems such as data source privacy requirements, structural standardization of data source entry, and the like, which cannot accurately match the same subject, such as: the entry format of the address is not standard, and a XXX street XXXX similar format of XXX province XXX city is standardized, and partial fields are always omitted in the entry. And if the name location is limited by the privacy of the data source, all the personal related fields cannot be extracted completely. Meanwhile, the traditional address clustering usually presents aggregative property due to occurrence of actual data transactions or data collection and entry, but the traditional clustering method generally only limits the minimum cluster number to control the final cluster number, which may cause that a large proportion of data are aggregated in a category after a data set is classified by the traditional clustering method, which is not in accordance with the actual transaction classification requirement.
Therefore, a transaction aggregation occurrence early warning method based on machine learning and an application thereof, which can significantly improve the recognition rate, the classification capability and the expansion capability, are urgently needed.
Disclosure of Invention
The embodiment of the application provides a transaction aggregation occurrence early warning method based on machine learning and application thereof, and aims to solve the problems of low recognition rate, low classification capability and the like in the prior art.
The core technology of the invention is mainly to carry out auxiliary classification on similar events in a data source again through a machine learning technology, thereby improving the usability and the accuracy of finally extracting the events.
In a first aspect, the present application provides a transaction aggregation occurrence warning method based on machine learning, including the following steps:
s00, extracting a characteristic structure of original data, wherein the characteristic structure at least comprises transaction text contents;
s10, extracting text contents in the transaction text contents, and dividing the text contents into address field characteristics, name field characteristics, transaction number field characteristics and original text field characteristics to realize the completion of the characteristic structure;
s20, converting the address field characteristics into standard structured addresses and longitude and latitude data thereof, and storing the standard structured addresses and the longitude and latitude data as new characteristics separately;
s30, selecting any longitude and latitude data as a data group point through a recursive density clustering algorithm, finding all data object points with the data group point density reaching to form a cluster, and classifying the data groups in the same cluster into the same type of data;
s40, processing text fields under the same type of data through a text similarity algorithm, further optimizing a classification result and storing similar transactions under the same category as a result into a result table;
and S50, outputting a result table and giving an early warning according to the result table.
Further, in step S00, the feature structure further includes an identification ID, where the identification ID is used for identifying the transaction.
Further, in step S00, the feature structure further includes a transaction time and a transaction location.
Further, in step S10, the address field features are converted into a standard structured address and its longitude and latitude data through the open API of the mapping software.
Further, in step S30, if the selected data group point is an edge point, another data object point is selected until all data object points are processed.
Further, in step S30, according to the cluster distribution, density clustering is performed according to the size of the cluster to form a new cluster, and recursion is performed until a set condition is reached to form a recursion result, and the data group points in the same cluster in the recursion result are classified into the same type of data.
Further, in step S30, the data grouping points in the same cluster are classified into the same type of data through the DBSCAN algorithm.
In a second aspect, the present application provides a transaction aggregation occurrence warning device based on machine learning, including:
the data extraction module is used for extracting a characteristic structure of original data, and the characteristic structure at least comprises transaction text content;
the data completion module is used for extracting the text content in the transaction text content and dividing the text content into an address field characteristic, a name field characteristic, a transaction number field characteristic and an original text field characteristic so as to complete the characteristic structure;
the data conversion module is used for converting the address field characteristics into standard structured addresses and longitude and latitude data thereof so as to be independently stored as new characteristics;
the recursive density clustering module is used for selecting any longitude and latitude data as data group points through a recursive density clustering algorithm, finding all data object points with the data group point density reaching to form a cluster, and classifying the data groups in the same cluster into the same type of data;
the text similarity calculation module is used for processing text fields under the same type of data through a text similarity calculation method, further optimizing classification results and storing similar transactions under the same category as results into a result table;
and the output module is used for outputting the result table and giving an early warning according to the result table.
In a third aspect, the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to execute the above-mentioned method for warning occurrence of transaction aggregation based on machine learning.
In a fourth aspect, the present application provides a readable storage medium having stored therein a computer program comprising program code for controlling a process to execute a process, the process comprising the method for machine learning based transaction aggregation occurrence warning according to the above.
The main contributions and innovation points of the invention are as follows: 1. compared with the prior art (a traditional text similarity meter algorithm, the traditional method has certain limitation on the judgment capability of text similarity, is limited by the input planning degree of the text, the completeness of main elements described in the text, the description habits of different data sources of similar events and other factors, and the recognition capability greatly fluctuates and cannot achieve a satisfactory recognition rate), the method has the advantages that the machine learning technology is used for secondarily classifying the similar events in the data sources, so that the usability and the accuracy of final extracted events are improved;
2. compared with the prior art, the method has the advantages that the recursive computation is added on the basis of the traditional clustering method, the region with higher density is divided more finely through parameter adjustment, the limits such as maximum depth are increased, the infinite increase of the computation time of the algorithm is limited, the classification capability and the computation rate of the algorithm are improved, the recursive computation is added in the traditional method, the number of the single clusters which are originally distributed badly can be controlled in a reasonable interval, the situation that a large amount of data are concentrated in one or a few clusters to influence the usability of data results is avoided, and meanwhile, the text similarity parameter and the classification parameter can be jointly adjusted, so that the expansion capability of the algorithm is improved;
3. the method and the device can clearly show the characteristics of the similar affair aggregation such as content, position, classification identification, time and the like, ensure that the similar affairs are identified under the set density in a certain area, assist the text similarity identification through the density clustering algorithm, more accurately identify the similar affairs which are difficult to identify originally, and simultaneously reduce the error identification of irrelevant affairs;
4. due to the characteristics of the traditional density clustering algorithm, the method can be insensitive to abnormal values (missing values and error entry values), can accept error positioning data to a certain extent (the abnormal values usually appear in the form of outliers, and the algorithm can classify the abnormal values into noise which is described later, so that the abnormal values in a certain number cannot influence the algorithm), and through the improvement of the algorithm, the problem that the algorithm does not have the maximum cluster upper limit is solved, the maximum depth of the algorithm is limited, and the operation time of the algorithm is ensured to be in a controllable range, which cannot be achieved by the traditional algorithm;
5. the method can set the time span of the transaction required to be calculated through parameter adjustment, define the occurrence frequency of the transaction and the like (the time span refers to that when a data source is extracted, the time span of the data source extracted by different service requirements is different, such as aggregation occurring in the month, the year and the quarter, the data source time characteristic screening can be used for completion, the occurrence frequency of the transaction is an early warning characteristic, namely, the specific value definition of aggregation in a service scene is defined, the algorithm can be independently adjusted according to the requirement, the definition of the same type of clusters is completed by the minimum cluster number, the occurrence frequency of the transaction is set as long as the occurrence frequency is larger than the minimum cluster (hereinafter minsample parameter) and has practical significance), calculate results according to different service requirements, meet the result data required by different services, discover dispute removal in time, settle a real estate dispute, an intellectual property right dispute, a campus neighborhood dispute, an ecological environment dispute, a marriage family dispute, a credit dispute, a business dispute, a land dispute, a landscape dispute and the like, and analyze the actual dispute conditions of multiple dispute in various places.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more concise and understandable description of the application, and features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a transaction aggregation occurrence warning method based on machine learning according to an embodiment of the present application;
FIG. 2 is a schematic diagram of the DBSCAN algorithm of the present application;
fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
The traditional text similarity meter algorithm has certain limitation on the judgment capability of the traditional method on text similarity, and is limited by factors such as the input planning degree of the text, the integrity degree of main elements of text narration, description habits on different data sources of similar events and the like, and the recognition capability has large fluctuation and cannot achieve satisfactory recognition rate. Meanwhile, the traditional address clustering usually presents aggregative property due to occurrence of actual data transactions or data collection and entry, but the traditional clustering method generally only limits the minimum cluster number to control the final cluster number, which may cause that a large proportion of data are aggregated in a category after a data set is classified by the traditional clustering method, which is not in accordance with the actual transaction classification requirement.
Based on the method, the similar transaction aggregation occurrence is distinguished through calculation based on the machine learning auxiliary text similarity so as to solve the problems in the prior art. Namely, based on traditional machine learning, based on business scene modification improvement.
Example one
The application aims to provide a transaction aggregation occurrence early warning method based on machine learning, similar events in a data source are classified again in an auxiliary mode through a machine learning technology, and usability and accuracy of final event extraction are improved.
Specifically, an embodiment of the present application provides a transaction aggregation occurrence early warning method based on machine learning, and specifically, with reference to fig. 1, the method includes:
s00, extracting a characteristic structure of original data (from a data source), wherein the characteristic structure at least comprises transaction text content;
in the step, the characteristic structures mainly comprise identification id, transaction text content, transaction time and transaction location, wherein the transaction text content is a necessary characteristic, and other characteristics can be filled with algorithm assistance according to requirements. Preferably, transaction time may assist in screening data source inputs by time screening a more precise time dimension, without which full data source data is used by default for calculations. The transaction location data may be empty or provide unstructured address data, with the step S10 calculation being performed by default since the data source is less likely to provide structured data.
S10, extracting text contents in the transaction text contents, and dividing the text contents into address field characteristics, name field characteristics, transaction number field characteristics and original text field characteristics to realize the completion of the characteristic structure;
in this step, the field extraction method of the number of affairs field characteristics comprises the following steps: after the numbers and Chinese number descriptions in the description of the text content are extracted in a regularization mode, data exceeding a certain length and a certain size range are removed by using a judgment rule to serve as the number of the text content, the number is compared with the number of the names extracted from the text, the larger value is output as the number of the transaction persons, wherein the extraction rule of the number of the text content (number of the transaction persons field characteristic) can be modified by using parameters, and the accuracy is improved according to different regions and different business requirements.
Because original data is not complete, the step aims to extract other features through only the text content of the transaction, so that the features required by subsequent recursive density clustering can be supplemented completely.
S20, converting the address field characteristics into standard structured addresses and longitude and latitude data thereof, and storing the standard structured addresses and the longitude and latitude data as new characteristics separately;
in this step, the standard structured address is converted by the high-resolution open API, for example, the structure is: XXX street XXX type data, XXX city XXX, prefecture, and latitude and longitude values for their addresses are stored separately as their new features. Not only the high-resolution open API, the Baidu open API, etc. may be used.
After this step, the features used for the computation have been extracted completely as: identification id, original text content of the transaction, transaction time, structured address of the transaction, transaction longitude, transaction latitude, number of persons of occurrence of the transaction and name of persons of the transaction.
S30, selecting any longitude and latitude data as a data group point through a recursive density clustering algorithm, finding all data object points with the data group point density reaching to form a cluster, and classifying the data groups in the same cluster into homogeneous data;
in this embodiment, the specific steps are as follows:
s31, selecting any longitude and latitude in a transaction data source (namely the data obtained in the S20) according to a class density clustering algorithm (a traditional density clustering algorithm), marking a selected data group point as p for a parameter Eps and a minimum (namely MinPts in a DBSCAN algorithm), and finding all data object points which can reach from the p density to form a cluster;
s32, according to the reasonableness of data source distribution and result cluster distribution, when a larger sample (which means a cluster with the maximum number of points, namely, in order to ensure that the maximum classification is smaller than a certain proportion of the total number of samples and avoid a large number of points being classified into a certain cluster) is classified into the same cluster, continuously carrying out density clustering on the larger data sample cluster to form a new cluster;
s33, the steps are classified into the same recursion process, and exit conditions such as the maximum cluster number, the maximum recursion layer number and the maximum sample distribution percentage are set (in order to perfect the parameter setting added by the algorithm, the problems of overlarge algorithm depth, overlong operation time, excessively unbalanced cluster distribution and the like are avoided);
s34, classifying the data groups (namely clusters) under the same cluster into the same type of data;
preferably, as shown in fig. 2, the data points (all points before being classified) are classified into three categories by the DBSCAN algorithm (the maximum density connected sample set derived from the density reachability relation, i.e. one cluster of the final cluster):
core Point (Core Point), if specimen χ i Contains at least MinPts samples, i.e. N ε (X i ) Not less than MinPts, then sample point χ is called i Is a core point;
boundary point (Border point), if sample x i Contains a smaller number of samples than MinPts samples, but in the domain of other core points it is called sample point χ i Is a boundary point;
noise points (Noise points), points that are neither core points nor boundary points.
The radius Eps (epsilon) and the number MinPts of the set tops can be set according to requirements.
The sum of data points in a certain range around each core point is compared with the data points which are integrally included in the calculation range, the clusters which exceed a certain threshold value are subjected to secondary recursive calculation until the proportion between the sum and the total data points reaches a set threshold value, a new cluster is generated in the process, the new cluster also includes a new core point, a boundary point and a noise point, the process is continued until all the points in the data are classified, and a final data result is formed.
Wherein, eps: the parameter size directly affects the number of classification results, and affects the number of borderlepoint of a single cluster together with the min _ sample parameter, the larger the numerical value is, the stricter the requirement on density is, the condition that a single point is a cluster is easily generated (the premise that min _ sample = 1), and the adjustment is set to 0.1 in this example according to the data quality and the specific requirement.
Min _ sample: the sum of the numbers of corepoint and boarderpoint in the smallest cluster can be adjusted through data quality and algorithm result adjustment and can also be adjusted according to service data, in this example, a warning mechanism cannot be triggered when a certain number of transactions occur, so min _ sample is set to be larger than 10, but because secondary calculation needs distance joint adjustment with text similarity, when the number of a single cluster is too small, similar transactions cannot be extracted from a single cluster, and finally min _ sample is set to be 100 after testing.
Distance: the text similarity decision threshold, through joint adjustment with algorithm parameters, is set to 300 in this example.
Ratio: the threshold value of the ratio of the total number of individual clusters corepoint and borderpoint to the total data point affects the final cluster number and algorithm depth, which in this example is set to 0.25.
Max _ deep: the maximum number of recursions of the recursive computation is adjusted comprehensively according to the performance of the computing hardware and the performance of the algorithm, and is set to be 5 in the example.
S40, processing text fields under the same type of data through a text similarity algorithm, further optimizing a classification result, and storing similar transactions under the same category as the result into a result table;
in this embodiment, a text similarity calculation interface is used to further process text fields (homogeneous data) of the same kind, and the text fields and parameters (Eps, minpts, distance, recursion depth, maximum sample percentage) in step S30 are jointly adjusted, so that the result meets the service requirement, and finally, similar transactions of the same kind are stored in a result table as the result.
And S50, outputting a result table and early warning according to the result table.
The early warning is realized by setting a statistical threshold value according to an algorithm result, can flexibly change according to service requirements, and the early warning based on aggregation is determined to take the total number of similar and similar transactions as the threshold value.
In this embodiment, the final data result includes an identifier id, a transaction text content, a transaction time, a transaction standardized address, a transaction classification identifier id, a number of transaction text persons, and a transaction name field.
The transaction classification id is the same, which indicates that the transaction under the id is a similar transaction occurring in a neighboring area, and the number of the transaction text persons clearly shows the number of the transaction inclusion persons described in the transaction description (the larger value of the number of the description persons and the total number of the extracted person names), and the number of the transaction inclusion persons clearly shows the person names included in the transaction description, so that the combined query with other result tables is facilitated.
Example two
Based on the same conception, the application also provides a transaction aggregation occurrence early warning device based on machine learning, which comprises:
the data extraction module is used for extracting a characteristic structure of original data, wherein the characteristic structure at least comprises transaction text content;
the data completion module is used for extracting the text content in the transaction text content and dividing the text content into an address field characteristic, a name field characteristic, a transaction number field characteristic and an original text field characteristic so as to complete the characteristic structure;
the data conversion module is used for converting the address field characteristics into standard structured addresses and longitude and latitude data thereof, and the standard structured addresses and the longitude and latitude data are used as new characteristics to be stored independently;
the recursive density clustering module is used for selecting any longitude and latitude data as data group points through a recursive density clustering algorithm, finding all data object points with the data group point density reaching to form a cluster, and classifying the data groups in the same cluster into the same type of data;
the text similarity calculation module is used for processing text fields under the same type of data through a text similarity calculation method, further optimizing classification results and storing similar transactions under the same category as results into a result table;
and the output module is used for outputting the result table and giving an early warning according to the result table.
EXAMPLE III
The present embodiment also provides an electronic device, referring to fig. 3, comprising a memory 404 and a processor 402, wherein the memory 404 stores a computer program, and the processor 402 is configured to execute the computer program to perform the steps of any of the above method embodiments.
Specifically, the processor 402 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
Memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, the memory 404 may include a hard disk drive (hard disk drive, abbreviated HDD), a floppy disk drive, a solid state drive (solid state drive, abbreviated SSD), flash memory, an optical disk, a magneto-optical disk, tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. The memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a static random-access memory (SRAM) or a dynamic random-access memory (DRAM), where the DRAM may be a fast page mode dynamic random-access memory 404 (FPMDRAM), an extended data output dynamic random-access memory (EDODRAM), a synchronous dynamic random-access memory (SDRAM), or the like.
Memory 404 may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by processor 402.
Processor 402, by reading and executing computer program instructions stored in memory 404, implements any of the machine learning based transaction aggregation occurrence warning methods of the embodiments described above.
Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402, and the input/output device 408 is connected to the processor 402.
The transmitting device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmitting device 406 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input and output devices 408 are used to input or output information. In this embodiment, the input information may be original data or the like, and the output information may be a result table or the like.
Example four
The embodiment also provides a readable storage medium, in which a computer program is stored, where the computer program includes program code for controlling a process to execute the process, and the process includes the method for warning occurrence of transaction aggregation based on machine learning according to the first embodiment.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may include one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. The transaction aggregation occurrence early warning method based on machine learning is characterized by comprising the following steps of:
s00, extracting a characteristic structure of original data, wherein the characteristic structure at least comprises transaction text contents;
s10, extracting text contents in the transaction text contents, and dividing the text contents into address field characteristics, name field characteristics, transaction number field characteristics and original text field characteristics to realize the completion of the characteristic structure;
s20, converting the address field characteristics into standard structured addresses and longitude and latitude data thereof to be used as new characteristics to be stored independently;
s30, selecting any longitude and latitude data as a data group point through a recursive density clustering algorithm, finding all data object points with the data group point density reaching to form a cluster, and classifying the data groups in the same cluster into the same type of data;
s40, processing text fields under the same type of data through a text similarity algorithm, further optimizing a classification result and storing similar transactions under the same category as a result into a result table;
and S50, outputting a result table and early warning according to the result table.
2. The machine-learning-based transaction aggregation occurrence warning method according to claim 1, wherein in step S00, the feature structure further includes an identification ID for identifying the transaction.
3. The machine learning-based transaction aggregation occurrence warning method according to claim 1, wherein in step S00, the feature structure further includes a transaction time and a transaction location.
4. The machine learning-based transaction aggregation occurrence warning method of claim 1, wherein in step S10, the address field features are converted into a standard structured address and its longitude and latitude data through an open API of a mapping software.
5. The machine-learning-based transaction aggregation occurrence warning method of claim 1, wherein in step S30, if the selected data group point is an edge point, another data object point is selected until all data object points are processed.
6. The transaction aggregation occurrence early warning method based on machine learning as claimed in claim 1, wherein in step S30, according to cluster distribution, density clustering is performed according to cluster size to form a new cluster, recursion is performed until a set condition is reached to form a recursion result, and data grouping points in the same cluster in the recursion result are classified into the same type of data.
7. The machine learning-based transaction aggregation occurrence warning method according to claim 6, wherein in step S30, the data group points in the same cluster are classified into the same class data through a DBSCAN algorithm.
8. A transaction aggregation occurrence early warning device based on machine learning is characterized by comprising:
the data extraction module is used for extracting a characteristic structure of original data, and the characteristic structure at least comprises transaction text content;
the data completion module is used for extracting the text content in the transaction text content and dividing the text content into an address field characteristic, a name field characteristic, a transaction number field characteristic and an original text field characteristic so as to complete the characteristic structure;
the data conversion module is used for converting the address field characteristics into standard structured addresses and longitude and latitude data thereof so as to be independently stored as new characteristics;
the recursive density clustering module is used for selecting any longitude and latitude data as a data group point through a recursive density clustering algorithm, finding all data object points with the density of the data group point reaching to form a cluster, and classifying the data groups in the same cluster into the same type of data;
the text similarity calculation module is used for processing text fields under the same type of data through a text similarity calculation method, further optimizing classification results and storing similar transactions under the same category as results into a result table;
and the output module is used for outputting the result table and giving an early warning according to the result table.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for warning occurrence of transaction aggregation based on machine learning according to any one of claims 1 to 7.
10. A readable storage medium having stored therein a computer program comprising program code for controlling a process to execute a process, the process comprising the machine learning based transaction aggregation occurrence warning method according to any one of claims 1 to 7.
CN202211294234.2A 2022-10-21 2022-10-21 Transaction aggregation occurrence early warning method based on machine learning and application thereof Pending CN115564156A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211294234.2A CN115564156A (en) 2022-10-21 2022-10-21 Transaction aggregation occurrence early warning method based on machine learning and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211294234.2A CN115564156A (en) 2022-10-21 2022-10-21 Transaction aggregation occurrence early warning method based on machine learning and application thereof

Publications (1)

Publication Number Publication Date
CN115564156A true CN115564156A (en) 2023-01-03

Family

ID=84767353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211294234.2A Pending CN115564156A (en) 2022-10-21 2022-10-21 Transaction aggregation occurrence early warning method based on machine learning and application thereof

Country Status (1)

Country Link
CN (1) CN115564156A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992856A (en) * 2024-04-03 2024-05-07 国网山东省电力公司营销服务中心(计量中心) User electricity behavior analysis method, system, device, medium and program product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992856A (en) * 2024-04-03 2024-05-07 国网山东省电力公司营销服务中心(计量中心) User electricity behavior analysis method, system, device, medium and program product

Similar Documents

Publication Publication Date Title
CN109948641B (en) Abnormal group identification method and device
US10621493B2 (en) Multiple record linkage algorithm selector
CN110689368B (en) Method for designing advertisement click rate prediction system in mobile application
US10922337B2 (en) Clustering of data records with hierarchical cluster IDs
CN105634855A (en) Method and device for recognizing network address abnormity
US20210263903A1 (en) Multi-level conflict-free entity clusters
CN105099729A (en) User ID (Identification) recognition method and device
CN115564156A (en) Transaction aggregation occurrence early warning method based on machine learning and application thereof
CN111585851B (en) Method and device for identifying private line user
CN112084761A (en) Hydraulic engineering information management method and device
CN115035347A (en) Picture identification method and device and electronic equipment
CN113434672B (en) Text type intelligent recognition method, device, equipment and medium
CN112347100B (en) Database index optimization method, device, computer equipment and storage medium
CN103226577A (en) News clustering method
CN112199388A (en) Strange call identification method and device, electronic equipment and storage medium
CN107886113B (en) Electromagnetic spectrum noise extraction and filtering method based on chi-square test
CN116091157A (en) Resource pushing method and device, storage medium and computer equipment
CN113992364A (en) Network data packet blocking optimization method and system
CN112100670A (en) Big data based privacy data grading protection method
CN112764839A (en) Big data configuration method and system for management service platform
CN115908998B (en) Training method of water depth data identification model, water depth data identification method and device
CN115953248B (en) Wind control method, device, equipment and medium based on saprolitic additivity interpretation
CN113641911B (en) Advertisement interception rule base establishing method, device, equipment and storage medium
US20230237018A1 (en) System and a method for the classification of sensitive data elements in a file
CN117708350B (en) Enterprise policy information association method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination