CN113268649A - Thread monitoring method and system based on diversified data fusion - Google Patents

Thread monitoring method and system based on diversified data fusion Download PDF

Info

Publication number
CN113268649A
CN113268649A CN202110240775.6A CN202110240775A CN113268649A CN 113268649 A CN113268649 A CN 113268649A CN 202110240775 A CN202110240775 A CN 202110240775A CN 113268649 A CN113268649 A CN 113268649A
Authority
CN
China
Prior art keywords
information
transaction
account
data
wallet address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110240775.6A
Other languages
Chinese (zh)
Other versions
CN113268649B (en
Inventor
俞海清
王利军
李磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianrun Foundation Technology Development Co ltd
Original Assignee
Beijing Tianrun Foundation Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianrun Foundation Technology Development Co ltd filed Critical Beijing Tianrun Foundation Technology Development Co ltd
Priority to CN202110240775.6A priority Critical patent/CN113268649B/en
Publication of CN113268649A publication Critical patent/CN113268649A/en
Application granted granted Critical
Publication of CN113268649B publication Critical patent/CN113268649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention relates to the technical field of information processing, and discloses a thread monitoring method and system based on diversified data fusion. The method comprises the following steps: account entity information of diversified data is extracted from multiple data sources, and cloud service-based multi-language translation is carried out on the extracted various account entity information; analyzing content texts corresponding to various account entity information to obtain clues and labeling the content containing the clues; analyzing and screening transaction data in the block chain network to obtain illegal transaction information; performing correlation analysis on various account entity information, clues with labels and illegal transaction information to obtain a network space clue; the cyber-spatial cue comprises a plurality of transaction items and an incidence relation among the transaction items. The embodiment of the invention realizes the discovery of the correlation among information from diversified information carriers, thereby completing the tasks of clue discovery, time series connection, person relation series connection, fund flow direction tracking and the like.

Description

Thread monitoring method and system based on diversified data fusion
Technical Field
The invention relates to the technical field of information processing, in particular to a thread monitoring method and system based on diversified data fusion.
Background
The existing internet information monitoring system is basically applied to monitoring the internet public sentiment of the internet, and the main technical scheme is that crawler software is utilized to collect domestic internet official media, portal news articles and comments, netizen postings and postings on internet forums, microblogs, WeChat public numbers, Baidu posters, notices and the like, foreign social media such as face books, twitter and the like, and personal articles and comment information and other contents in some mobile internet applications (such as a small red book) and then natural language processing, semantic analysis, sentiment analysis and sensitive word discovery are carried out on the contents texts, so that the internet public sentiment discovery and content control are realized. Other applications include finding bad information such as yellow gambling poison, finding users with microblog characteristics, portrait portrayal and the like.
The technology and the application are mainly oriented to open network information, and lack mature, stable and large-scale analysis and application in the fields of dark networks, foreign mainstream instant messaging tools and the like.
In the network direction of emerging blockchain such as bitcoin and ether house, more domestic attention is paid to the aspects of networking technology, encryption algorithm research, basic network platform construction, large-scale miner production, intelligent contract transaction, information release of the emerging currency blockchain related technology and application and the like.
The inventor finds that the main disadvantages of the related internet information collecting and analyzing system are as follows:
1. in the aspect of data source, pure bright network information is taken as the main, or data acquisition and service analysis are carried out aiming at a specific deep network or dark network. For example, public opinion information collection and analysis systems mainly based on open network information mainly aim at information processing of open networks, data collection focuses on news websites, forum websites, social media websites, e-commerce websites, knowledge question and answer websites, various other websites which can be retrieved by search engines, and the like, and information analysis of the public opinion information collection and analysis systems mainly aims at network public opinion supervision; particularly, information source information collection related to illegal criminal behaviors, such as QQ groups, deep web forums or hidden web forums, achieves identification of illegal related information, such as advertising information and price information, and the character image function of information publishers through means of sensitive word analysis and the like. Less involved in the acquisition and content analysis of specific privatized internet data.
2. In the aspect of analysis technology, semantic analysis of content is mainly used, and content texts are analyzed through some machine learning algorithms such as basic word segmentation, entity word extraction, sensitive word discovery, classification and clustering; the method lacks of illegal transaction information analysis, and lacks of effective means and specific application for person finding and serial money.
3. When the character characteristics are described, the character characteristics are generally described for the same type of information sources or a certain information field, for example, content analysis and account number character image for microblog data, character image and commodity recommendation for individual purchasing preference of a certain e-commerce website, character characteristic image and content recommendation for access information acquisition and reading direction and interest points for internet information reading and access behavior characteristics, and comprehensive analysis and association analysis of data content across information sources, media and industries are rarely provided.
4. Due to the limitation of the requirement and the complexity of the technology, the private data of the open network, the dark network, the block chain network station and the instant messaging tool are not or less collected and associated for analysis.
Disclosure of Invention
The embodiment of the invention aims to provide a clue monitoring method and a clue monitoring system based on diversified data fusion, which solve the problems that private data of a bright network, a dark network, a block chain network station and an instant messaging tool are collected at the same time and subjected to correlation analysis.
To solve the above technical problem, in a first aspect, an embodiment of the present invention provides a thread monitoring method based on diversified data fusion, including:
account entity information of diversified data is extracted from multiple data sources, and cloud service-based multi-language translation is carried out on the extracted various account entity information;
analyzing content texts corresponding to the various types of account entity information to obtain clues and labeling the content containing the clues;
analyzing and screening transaction data in the block chain network to obtain illegal transaction information;
performing correlation analysis on the various types of account entity information, clues with labels and illegal transaction information to obtain a network space clue; wherein the cyber-spatial cue comprises a plurality of transaction items and an associative relationship between the plurality of transaction items; the plurality of transaction items are respectively: transaction time, transaction mode, transaction funds, and flow of funds.
Additionally, the plurality of data sources includes: at least two of the bright web information, the dark web information and the application data of the privatized instant messaging tool; the account entity information includes: the publisher account number, the block chain address, the person name, the place name, the organization name and the article name; the open network information comprises: open web news, open web forums, open web social media; the darknet information comprises: darknet articles, darknet forums; the privatized instant messenger application data comprises: telegram group talk, WhatsApp group talk.
In addition, the account entity information for extracting diversified data from various data sources includes:
collecting group information of a plurality of data sources and storing the group information in a database;
processing the collected group information, removing interference information, and supplementing basic data corresponding to the group information; wherein the base data comprises: source, account, type, and standard time;
setting a timer for each data source according to the data increment of each data source, wherein the timer is used for periodically extracting account entity information of newly acquired data;
for each data source, when reaching each extraction starting time, randomly reading a preset number of new acquisition information from the database, and putting the new acquisition information into a thread pool for processing;
reading one piece of information for each thread in the thread pool, sequentially reading regular expressions of various account numbers, matching the read information, and if the matching is successful, acquiring the matched entity information of the account numbers and returning;
the validity of the successfully matched bit currency wallet address is verified; respectively verifying whether the bit coin wallet address beginning with bc1 is a valid Bech32 format character string or a valid Base58 format character string, and if so, determining that the bit coin wallet address is valid account entity information;
for the Ethenhouse wallet address, if the Ethenhouse wallet address has the length of 40 and can be converted into decimal data after characters at the beginning of 0x are removed, the Ethenhouse wallet address is determined to be effective;
and warehousing the extracted account entity information.
In addition, the cloud service-based multilingual translation of the extracted entity information of each type includes:
reading information to be translated from a database, pushing the information to be translated to a translation module through a remote process calling interface, and sequentially carrying out noise filtration and target language identification on the information to be translated by the translation module; if the target language information is judged to be the target language information, directly returning the original data, and if the target language information is judged to be the non-target language information, forwarding the non-target language information to a cloud service translation engine interface for translation to obtain a corresponding target language translation;
and writing the target language translation returned by the cloud service translation engine into a field corresponding to the original text in the database table, and simultaneously writing the original text, the target language translation and other related information into a full-text index library.
In addition, the analyzing the content texts corresponding to the various types of account entity information to obtain clues and labeling the content containing the clues specifically includes:
obtaining a thread to be retrieved, and constructing a detection model according to the thread to be retrieved;
the method comprises the steps of reading topics, attention accounts and attention groups set by a user in a database in a classified mode, and retrieving information from a full-text index library according to the topics, the attention accounts and the attention groups;
for the retrieved information, extracting characteristic information by adopting a multi-language deep natural language understanding technology; wherein the feature information includes: keywords, abstracts, identifying topic categories, moods, representative content, representative perspectives;
constructing an entity relationship and an event story line according to the characteristic information, and performing overall analysis from multiple dimensions to provide event global information; wherein the multiple dimensions comprise time, regions and fields;
and writing the processed information into a database.
In addition, the blockchain network comprises an Ethernet network,
correspondingly, analyzing and screening the transaction data in the blockchain network to obtain illegal transaction information, which specifically comprises the following steps:
reading block data from local Ether house nodes according to the sequence of block numbers, reading each transaction data in the blocks in sequence, and writing the detailed information of each transaction into an independently stored Ether house transaction database; the data block is synchronized to the local from the Internet open node in advance, and the detailed information of each transaction comprises: a block number, a block hash, a transaction index, a transaction timestamp, an output wallet address, an input wallet address, an ethernet transaction amount, gas, a gas price, a disturbance parameter, and a transaction epilogue;
writing the newly found Ether house wallet address in the entity extraction result into an Ether house key wallet address table;
sequentially taking out the purse addresses of the Ether houses from the key purse address table of the Ether houses, retrieving new transactions related to the local purse from the transaction database of the Ether houses, and writing the new transactions into the database table;
the blockchain network comprises a bitcoin network;
correspondingly, analyzing and screening the transaction data in the blockchain network to obtain illegal transaction information, which specifically comprises the following steps:
reading block data from local bitcoin nodes according to the sequence of block numbers, sequentially reading each transaction data in the blocks, and writing detailed information of each transaction into an independently stored bitcoin transaction database; the bit currency transaction block data is synchronized to the local from the internet open node in advance, and each bit currency transaction data comprises: block height, block hash, transaction hash, block timestamp, serial number n in transaction, transaction wallet address, transaction direction, previous transaction hash, bitcoin transaction amount;
writing the newly found bit coin wallet address in the entity extraction result into a bit coin key wallet address table;
and sequentially taking out the addresses of the bit currency wallets from the bit currency key wallet address table, retrieving new transactions related to the local wallets from the bit currency transaction database, and writing the new transactions into the database table.
In addition, the correlation analysis is performed on the various types of account entity information, clues with tags and illegal transaction information to obtain a network space clue, and the method specifically comprises the following steps:
and (3) propagation traceability analysis: preprocessing the acquired information and generating a SimHash code which can represent the core content of texts and images and has robustness on part quantity change; forming a release time sequence of the same content information through matching and iterative query of SimHash, thereby tracing an information source and constructing a propagation path;
and (3) account traceability investigation:
for multiple types of accounts, performing association search by combining multiple types of data; wherein the multiple types of accounts include: social media, communication groups, user accounts of the hidden network, communication accounts of WhatsApp and Telegram in the information, a mailbox and a digital token account; the multi-class data comprises: internet, social media, social work repository;
single account portrait analysis:
acquiring historical postings and social relations of the multiple types of accounts; the multiple types of accounts comprise: social media, instant messaging, hidden network theory jar account;
analyzing activity characteristic information of the multiple types of accounts, wherein the activity characteristic information comprises: possible geographic locations, activity periods, affiliations, topics of interest, representative utterances;
deducing account characteristics of the multiple types of accounts according to the activity characteristic information; the account characteristics include: identity, political inclination, risk point;
analyzing the community to which the user belongs, whether the user is a main account of the community, and whether the user is a main association account or not and friend account information;
detecting communities and key people:
aiming at a specified theme or group, acquiring an account group related to social media and an instant messaging tool, constructing a social association relationship through reprinting, the same group, Follow-up and the like, detecting communities hidden in the account group, communicated with each other and leading the theme or group, and detecting leading key account numbers in the communities;
group characteristic portrait:
analyzing the characteristic images of the individual accounts of the customer-defined groups and the detected communities to form the characteristics of the groups; characteristics of the population include possible geographic locations, activity periods, affiliations, topics of interest, representative speech, possible identities, political trends, risk points;
and (3) overall situation perception:
and counting the collected information sources, the collected information, the monitored clues and the detected communities according to time, regions, sources, clue labels and the like to form the overall situation of the social media monitoring target.
In addition, the WEB application for monitoring the cyberspace cue specifically includes one of the following or any combination thereof:
and (3) propagation traceability analysis: preprocessing the acquired information and generating a SimHash code which can represent the core content of texts and images and has robustness on part quantity change; forming a release time sequence of the same content information through matching and iterative query of SimHash, thereby tracing an information source and constructing a propagation path;
and (3) account traceability investigation:
for multiple types of accounts, performing association search by combining multiple types of data; wherein the multiple types of accounts include: social media, communication groups, user accounts of the hidden network, communication accounts of WhatsApp and Telegram in the information, a mailbox and a digital token account; the multi-class data comprises: internet, social media, social work repository;
single account portrait analysis:
acquiring historical postings and social relations of the multiple types of accounts; the multiple types of accounts comprise: social media, instant messaging, hidden network theory jar account;
analyzing activity characteristic information of the multiple types of accounts, wherein the activity characteristic information comprises: possible geographic locations, activity periods, affiliations, topics of interest, representative utterances;
deducing account characteristics of the multiple types of accounts according to the activity characteristic information; the account characteristics include: identity, political inclination, risk point;
analyzing the community to which the user belongs, whether the user is a main account of the community, and whether the user is a main association account or not and friend account information;
detecting communities and key people:
aiming at a specified theme or group, acquiring an account group related to social media and an instant messaging tool, constructing a social association relationship through reprinting, the same group, Follow-up and the like, detecting communities hidden in the account group, communicated with each other and leading the theme or group, and detecting leading key account numbers in the communities;
group characteristic portrait:
analyzing the characteristic images of the individual accounts of the customer-defined groups and the detected communities to form the characteristics of the groups; characteristics of the population include possible geographic locations, activity periods, affiliations, topics of interest, representative speech, possible identities, political trends, risk points;
and (3) overall situation perception:
and counting the collected information sources, the collected information, the monitored clues and the detected communities according to time, regions, sources, clue labels and the like to form the overall situation of the social media monitoring target.
In a second aspect, an embodiment of the present invention provides a thread monitoring system based on multivariate data fusion, including:
the account entity information extraction module is used for extracting account entity information of diversified data from various data sources;
the translation module is used for performing multi-language translation based on cloud service on the extracted various account entity information;
the clue finding module is used for analyzing the content texts corresponding to the various types of account entity information to obtain clues and labeling the content containing the clues;
the block chain transaction analysis module is used for analyzing and screening transaction data in a block chain network to obtain illegal transaction information;
the clue analysis module is used for performing correlation analysis on the various account entity information, clues with labels and illegal transaction information to obtain a network space clue; wherein the cyber-spatial cue comprises a plurality of transaction items and an associative relationship between the plurality of transaction items; the plurality of transaction items are respectively: transaction time, transaction mode, transaction funds, and flow of funds.
Compared with the prior art, the embodiment of the invention integrates data association analysis methods of heterogeneous information such as a light network, a dark network, a block chain network, a privatized non-public network and the like, and realizes the discovery of the correlation among information from diversified information carriers, thereby completing tasks such as clue discovery, time series, character relation series, fund flow direction tracking and the like; the invention aims to realize cross-industry, cross-field and cross-information-source data combined processing, and search certain correlation from various information data which are seemingly irrelevant, so that a behavior route analysis chart from people to events and covering time, place, behavior and fund, and a knowledge base formed by the relation chart among multiple factors are constructed, and the invention can be applied to various business scenes such as national security, criminal investigation, financial management and control and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, it is understood that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flowchart of a thread monitoring method based on multivariate data fusion according to an embodiment of the present invention;
fig. 2 is a schematic flowchart illustrating entity information extraction in a thread monitoring method based on multivariate data fusion according to an embodiment of the present invention;
FIG. 3 is a schematic flowchart illustrating multilingual translation in a thread monitoring method based on multivariate data fusion according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a thread discovery process in the thread monitoring method based on multivariate data fusion according to the embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a process of analyzing the ether house transaction data in the thread monitoring method based on the multivariate data fusion according to the embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a bit currency transaction data parsing flow in a thread monitoring method based on multivariate data fusion according to an embodiment of the present invention;
fig. 7 is a schematic view of a topic-based information monitoring process in a WEB application in the thread monitoring method based on multivariate data fusion according to the embodiment of the present invention;
fig. 8 is a schematic view of an account or group-based information monitoring process in a WEB application in the method for monitoring a hint based on multivariate data fusion according to the embodiment of the present invention;
fig. 9 is a schematic diagram of an information monitoring process based on wallet address in a WEB application in the method for monitoring clues based on multivariate data fusion according to the embodiment of the present invention;
FIG. 10 is a schematic flowchart of a thread monitoring method based on multivariate data fusion according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a thread monitoring system based on multivariate data fusion according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described through embodiments with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a thread monitoring method based on multivariate data fusion according to an embodiment of the present invention. The technical solution of this embodiment can be applied to the thread retrieval system as shown in fig. 11. The method can be executed by a thread monitoring system based on diversified data fusion provided by the embodiment of the invention.
As shown in fig. 1, the thread monitoring method based on multivariate data fusion of this embodiment includes the following steps:
step 101: extracting various account entity information from various data sources; and performing multi-language translation based on cloud service on the extracted various account entity information.
Wherein the plurality of data sources comprises: at least two of the bright web information, the dark web information and the application data of the privatized instant messaging tool; the account entity information includes: publisher account, blockchain address, person name, place name, organization name, article name, etc.; the open network information comprises: open web news, open web forums, open web social media, etc.; the darknet information comprises: darknet articles, darknet forums, and the like; the privatized instant messenger application data comprises: telegram group talk, WhatsApp group talk. It is understood that the present embodiment does not specifically limit the data source and the account entity information.
Optionally, in this embodiment, as shown in fig. 2, step 101 specifically includes the following sub-steps:
substep 1011: and collecting group information of a plurality of data sources and storing the group information in a database.
The collecting manner of the open network information, the dark network information, the group information of the personalized instant messaging tool, and the like is known to those skilled in the art, and will not be described herein again.
Substep 1012: and processing the acquired group information, removing interference information, and supplementing basic data corresponding to the group information. The basic data may include: source, account number, type, and standard time. Removing the interference information includes, but is not limited to, removing irrelevant information such as advertisements.
Substep 1013: and setting a timer for each data source according to the data increment of each data source, wherein the timer is used for periodically extracting the entity information of the account number of the newly acquired data, so that the information processing of the data sources with different data increments can be considered.
Substep 1014: and for each data source, when each extraction starting time is reached, randomly reading a preset number of new acquisition information from the database, and putting the new acquisition information into a thread pool for processing. For example, the extraction period of the bright web information is T1, the extraction period of the dark web information is T2, and different privatization instant messaging tools may have respective extraction periods. It can be understood that, when reading the new acquisition information for processing, the new acquisition information may be read in a certain order, such as an acquisition time order, instead of a random reading manner. The number of the newly acquired information read each time is not particularly limited in this embodiment, and the preset number may be set according to the processing capability.
And if the matching is successful, acquiring the entity information of the matched account and returning. Regular expressions for various types of accounts include, but are not limited to, regular expressions for QQ users, QQ groups, WeChat users, WeChat groups, Etherhouse purses, Bitty purses, WhatsApp accounts, WhatsApp groups, Telegram accounts, Telegram groups, Facebook accounts, Instagram accounts, twitt accounts, githu accounts, mailbox accounts, Exploit.im accounts, Steemit.com accounts, Medium.com accounts, Bat accounts, and the like. And matching the read information, and if the matching is successful, acquiring the matched account information and returning.
Substep 1015: and carrying out validity verification on the successfully matched bitcoin wallet address.
And respectively verifying whether the bit coin wallet address beginning with bc1 is a valid Bech32 format character string or a Base58 format character string, and if so, determining that the bit coin wallet address is valid account entity information. For the etherhouse wallet address, if it has a length of 40 and can be converted into decimal data after removing the characters at the beginning of 0x, it is determined that it is a valid etherhouse wallet address. By verifying the validity of the account number, the data processing amount can be reduced, and the processing efficiency can be improved.
Substep 1016: and warehousing the extracted account entity information, namely storing the extracted account entity information in a database.
The account entity information includes but is not limited to: account number, person name, place name, item name, etc.
Optionally, in this embodiment, before extracting various types of account entity information from multiple data sources, a data source raw data translation step may be further included. Further, as shown in fig. 3, performing the cloud service-based multilingual translation on the extracted entity information of each type specifically includes the following steps:
substep 301: and reading the information to be translated from the database, and pushing the information to be translated to the translation module through the remote procedure call interface.
Substep 302: the translation module sequentially performs noise filtration and target language identification on information to be translated; and if the target language information is judged to be the target language information, directly returning the original data, and if the target language information is judged to be the non-target language information, forwarding the non-target language information to a cloud service translation engine interface for translation to obtain a corresponding target language translation.
Specifically, the target language can be Chinese, correspondingly, the translation module firstly carries out noise filtering processing on original information, removes invalid information such as link addresses in the information, then carries out Chinese language identification, directly returns original data when the information is judged to be Chinese information, and forwards non-Chinese information to a cloud service translation engine interface for translation.
Substep 303: and writing the target language translation returned by the cloud service translation engine into a field corresponding to the original text in the database table, and simultaneously writing the original text, the target language translation and other related information into a full-text index library.
Step 102: analyzing content texts corresponding to various types of account entity information to obtain clues, and labeling the content containing the clues.
Optionally, as shown in fig. 4, step 102 specifically includes the following sub-steps:
substep 1021: and obtaining a thread to be retrieved, and constructing a detection model according to the thread to be retrieved.
In particular, the thread to be retrieved may be a user-defined, interesting thread. The detection model is constructed based on user-defined, interesting cues. Cues are automatically identified and detected by a detection model. In this embodiment, the types of cues detectable by the detection model include, but are not limited to: category 10 intelligence clues; 38 types of security threat clues such as explosion, facility collapse and the like; and more than 10 types of political and economic clues such as international sanctions, important character opinions and the like are supported.
Substep 1022: and reading the subject, the concerned account and the concerned group set by the user in the database in a classified manner, and retrieving information from the full-text index database according to the subject, the concerned account and the concerned group.
Specifically, the analysis calling program may read the topic, the attention account and the attention group set by the user in the database in a classified manner, and retrieve information from the full-text information index library according to the topic keyword or the ID or name of the account, the group.
Substep 1023: for the retrieved information, feature information is extracted.
For example, feature information may be extracted using multi-language deep natural language understanding techniques. Wherein the characteristic information includes: keywords, abstracts, identifying topic categories, emotions, representative content, representative perspectives.
Substep 1024: and constructing entity relationships and event story lines according to the characteristic information, and performing overall analysis from multiple dimensions to provide event global information. Wherein, the multiple dimensions comprise time, regions and fields.
Sub-step 1025: and writing the processed information into a database. If the information already exists in the database, the subsequent processing is skipped.
Step 103: and analyzing and screening the transaction data in the block chain network to obtain illegal transaction information.
Optionally, the blockchain network includes an ethernet network, and accordingly, as shown in fig. 5, the step 103 of analyzing and screening the transaction data in the blockchain network to obtain the illegal transaction information specifically includes the following sub-steps:
substep 10311: and reading the block data from the local Ether house nodes according to the sequence of block numbers, sequentially reading each transaction data in the blocks, and writing the detailed information of each transaction into an independently stored Ether house transaction database.
The block data is synchronized to the local from the internet open node in advance, and the detailed information of each transaction includes but is not limited to: block number (blocknumber), block hash (blockhash), transaction hash (hash), transaction index, transaction timestamp (timestamp), export wallet address (txfrom), import wallet address (txin), ethernet transaction amount (value), gas (gas), gas price (gas price), perturbation parameter (nonce), and transaction epilogue (input).
Substep 10312: and writing the newly found Ether house wallet address in the entity extraction result into the Ether house key wallet address table.
Substep 10313: and sequentially taking out the purse addresses of the Ethernet workshops from the key purse address table of the Ethernet workshops, retrieving new transactions related to the Ethernet workshops from the transaction database of the Ethernet workshops, and writing the new transactions into the database table.
Optionally, the blockchain network may further include a bitcoin network, and accordingly, as shown in fig. 6, the step 103 of analyzing and screening the transaction data in the blockchain network to obtain the illegal transaction information specifically includes the following sub-steps:
substep 10321: and reading the block data from the local bitcoin nodes according to the block number sequence, sequentially reading each transaction data in the block, and writing the detailed information of each transaction into an independently stored bitcoin transaction database.
The bitcoin transaction block data is synchronized to the local from the internet open node in advance, and each bitcoin transaction data includes but is not limited to: block height (block _ height), block hash (block _ hash), transaction hash (hash), block timestamp (block _ time), sequence number n in the transaction, transaction wallet address (address), transaction direction (op), previous transaction hash (hash), bitcoin transaction amount (value).
Substep 10322: and writing the newly found bit coin wallet address in the entity extraction result into a bit coin key wallet address table.
Substep 10323: and sequentially taking out the addresses of the bit currency wallets from the bit currency key wallet address table, retrieving new transactions related to the local wallets from the bit currency transaction database, and writing the new transactions into the database table.
Step 104: and performing correlation analysis on various account entity information, clues with tags and illegal transaction information to obtain a network space clue.
The cyber-spatial cue comprises a plurality of transaction items and an incidence relation among the transaction items. Wherein the plurality of transaction items include, but are not limited to: time of transaction, location, action, funds, etc. Associations include, but are not limited to: the behavior route analysis chart of the fund, the relation map among multiple factors and the like.
Step 104, performing association analysis on various types of account entity information, clues with tags, and illegal transaction information to obtain a network space clue may specifically include one or any combination of the following to implement diversified data association analysis (also called diversified data association analysis module).
And (3) propagation traceability analysis: preprocessing the acquired information and generating a SimHash code which can represent the core content of texts and images and has robustness on part quantity change; through matching and iterative query of SimHash, a release time sequence of the same content information is formed, so that information sources are traced and propagation paths are constructed. The propagation traceability analysis function of the present embodiment can support the following functions including but not limited to: social media, monitored instant messaging groups, cross-channel propagation traceability analysis of monitored darknets, and the like are disclosed.
And (3) account traceability investigation: for multiple types of accounts, performing association search by combining multiple types of data; wherein the multiple types of accounts include but are not limited to: social media, communication groups, user accounts of the hidden network, communication accounts of WhatsApp and Telegram in the information, a mailbox and a digital token account; the classes of data include, but are not limited to: internet, social media, social work repository. Thereby providing help for social target investigation, landing investigation and the like.
Single account portrait analysis: acquiring historical postings and social relations of the multiple types of accounts; the multiple types of accounts include but are not limited to: social media, instant messaging, hidden network theory jar account; analyzing activity characteristic information of the multiple types of accounts, wherein the activity characteristic information includes but is not limited to: possible geographic locations, activity periods, affiliations, topics of interest, representative utterances; deducing account characteristics of the multiple types of accounts according to the activity characteristic information; the account features include, but are not limited to: identity, political inclination, risk point; analyzing the community to which the user belongs, whether the user is a main account of the community, and whether the user is a main association account or not and friend account information; detecting communities and key characters, acquiring account groups related to social media and instant communication tools aiming at a specified theme or group, constructing social association relations through reprinting, the same group, Follow and the like, detecting communities hidden in the account groups, communicated with each other and dominating the theme or the group, and detecting dominating key account numbers in the communities. Therefore, a key target can be provided for the tracing and processing of the event clues.
Group characteristic portrait: analyzing the characteristic images of the individual accounts of the customer-defined groups and the detected communities to form the characteristics of the groups; the characteristics of the population include possible geographic locations, periods of activity, affiliations, topics of interest, representative speech, possible identities, political trends, points of risk. Thereby providing support for discovering and monitoring key target groups.
And (3) overall situation perception: and counting the collected information sources, the collected information, the monitored clues and the detected communities according to time, regions, sources, clue labels and the like to form the overall situation of the social media monitoring target.
Optionally, the WEB application of the cyber-spatial cue monitoring obtained in step 104 may specifically include any one or a combination of topic-based information monitoring, account or group-based monitoring, and wallet address-based monitoring.
As shown in fig. 7, topic-based information monitoring may include the following steps:
step 701: acquiring retrieval input information of a user; wherein the retrieving the input information comprises: a keyword or a combination of keywords.
Step 702: and performing information retrieval according to the retrieval input information.
Step 703: and if the retrieval result meets the condition, generating a monitoring subject, monitoring the monitoring subject within preset time, and finding text information meeting the filtering condition of the subject and an input source of the information from the original collected information.
Specifically, a single or any combination of information sources can be selected, a search time period is input, a single or multiple clue tags (selectable items) are selected, and search ranges (selectable items) such as titles, texts, links, accounts, groups and the like are selected to perform information search. If the retrieval condition and the output result are satisfied, a monitoring subject can be set, long-term (preset time) monitoring is automatically carried out on the monitoring subject, and text information which meets the filtering condition of the monitoring subject and an input source of the information are found from a large amount of original collected information.
Step 704: and associating with the extracted account entity information to form an association relation between the clue and the account.
Step 705: and if the associated account entity information contains bitcoin or EtherFang wallet address, taking the time point of issuing the original text information as a filtering condition, and acquiring all transaction records related to the wallet after the time point from the transaction database.
Step 706: and forming a transaction linked list, and displaying data in a form or flow graph mode, wherein the data comprises the monitored wallet address, the transfer amount and time, the related wallet address of the transfer flow direction and a transaction chain (the transaction chain can be in multiple stages).
Step 707: the wallet address is associated with a particular account in conjunction with other account entities extracted from the text.
Wherein, the other account entities include but are not limited to: mailbox, QQ number, twitter account or other account, etc. The wallet address is associated with a specific person implementation.
As shown in fig. 8, the monitoring based on account number or group specifically includes:
step 801: and acquiring account information or group information input by a user.
Step 802: and searching according to the account information or the group information and the various search items.
Step 803: if the retrieval result meets the preset condition, generating a monitoring target account or a target group; and monitoring the target account or the target group within a preset time, and filtering and taking out information which is issued by the target account or the target group and accords with a preset clue label from an information source to which the target account or the target group belongs.
Specifically, the user inputs account information or group information including, but not limited to Id or description through a retrieval function, may select a single or any combination of information sources, select a single or multiple clue tags (selectable items), select a title, a text, a link, an account, a group and other retrieval ranges (selectable items), perform information retrieval, if the retrieval conditions and the output results are satisfactory, may join in a monitoring target account or target group, automatically perform long-term (preset time) monitoring on the account or group, and filter and take out information issued by the account or group and conforming to the preset clue tags from the information sources to which the account or group belongs.
Step 804: and associating the extracted account entity information to form an association relation between the target account or the group and other accounts.
Step 805: and if the associated account entity information contains bitcoin or EtherFang wallet address, taking the time point of issuing the original text information as a filtering condition, and acquiring all transaction records related to the wallet after the time point from the transaction database.
Step 806: and forming a transaction linked list, and displaying data in a form or flow graph mode, wherein the transaction linked list comprises the monitored wallet address, the transfer amount and time, the transfer flow direction related wallet address and a transaction chain of a later knot.
Step 807: and associating the wallet address with a specific account in combination with other account entity information extracted from the text. Wherein, the other account entities include but are not limited to: mailbox, QQ number, twitter account or other account, etc., to associate the wallet address with a particular person.
As shown in fig. 9, the monitoring based on the wallet address specifically includes:
step 901: and acquiring the bitcoin wallet address or the Ethengfang wallet address information input by the user, and retrieving the bitcoin wallet address or the Ethengfang wallet address information from the key wallet information table and the full wallet address library.
Step 902: and if the retrieval result meets the preset condition, generating a monitoring target wallet, monitoring the wallet within the preset time, and associating the wallet with the information in the entity extraction information table, other account entities and the information text.
Specifically, the user inputs a bitcoin wallet address or ether house wallet address information through a retrieval function, retrieves from a key wallet information table (a wallet address knowledge base obtained by analysis in a diversified analysis module, see below) and a full wallet address base, selects a single or any information source combination, selects a single or multiple clue tags (selectable items), selects retrieval ranges (selectable items) such as titles, texts, links, accounts and groups, performs information retrieval, can join in a monitoring target wallet, automatically performs long-term (preset time) monitoring on the wallet, and associates information in an extracted account entity information table with other account entities and information texts.
Step 903: and acquiring all transaction records related to the wallet from a bitcoin or EtherFang wallet transaction database to which the address of the wallet belongs.
Step 904: and forming a transaction linked list, and displaying data in a form or flow graph mode, wherein the transaction linked list comprises the monitored wallet address, the transfer amount and time, the transfer flow direction related wallet address and a transaction chain of a later knot.
Step 905: and associating the wallet address with a specific account in combination with other account entity information extracted from the text. Wherein, the other account entities include but are not limited to: mailbox, QQ number, twitter account or other account, etc., to associate the wallet address with a particular person.
Fig. 10 is a flowchart illustrating a thread monitoring method based on multivariate data fusion according to an embodiment of the present invention. The thread monitoring method comprises the following steps: information from a variety of data sources is collected, including but not limited to: the method comprises the steps that group information of the open network information, the dark network information and the privatized instant messaging tool is stored in a database after data de-duplication, cleaning and the like are carried out on collected information, multi-language translation is carried out on the information in the database, the information in different languages is translated into a target language (Chinese), clue analysis is carried out on the Chinese information, after polarity analysis (positive and negative), the information providing clues is labeled (namely information labeling), and the labeled information is stored in a full-text index base. The information unified into Chinese is subjected to entity extraction to obtain account entity information, including but not limited to: account number, person name, place name, organization, item, etc. And storing the entity information into a database after verifying the validity of the extracted bitcoin and the address of the Etheng wallet. The illegal transaction information is extracted from the bit currency full-volume transaction data, the illegal transaction information is extracted from the Ethernet currency full-volume transaction data, and the extracted entity information, the discovered clues and the analyzed illegal transaction information are subjected to correlation analysis by adopting the mode of the embodiment.
By implementing the embodiment of the invention, in the aspect of data diversification development, data of news, forums, meta-search, microblogs, public numbers and the like of the open network, article and forum data of three hidden networks such as Tor, I2P, ZeroNet, instant communication tool group communication data such as WhatsApp, Telegram and the like, hidden information such as bit alliance website information, intelligent contracts of etherhouses, transaction postscripts and the like, and full transaction data of bitcoin and etherhouses can be fused, and the data volume acquired every day can reach the level of ten million. In the aspect of data joint analysis, a large amount of entity object information can be extracted. The accuracy and the extraction efficiency are high in the aspect of account entity data extraction. In the aspect of the association analysis of the diversified data, the development flow direction and the time sequence of the events can be better described, and the method is specifically represented by the following steps: the information retrieval is realized by combining the keyword combination with the clue tag, information publishers, publishing groups and the like are further discovered, various entity object information (e.g. postboxes, telephone numbers, QQ numbers, micro signals, twitter, facebook and the like) published in the information content is combined, and clue persons can be effectively discovered and locked; through the bit currency and the Etherhouse wallet address published in the information content, the transaction can be filtered and tracked according to the information release time, the process and the target address of fund transfer can be effectively searched aiming at illegal criminals such as fraud and money laundering, and clues and detection assistance are provided for criminal cases which are difficult to solve by traditional means.
Compared with the prior art, the embodiment of the invention integrates data association analysis methods of heterogeneous information such as a light network, a dark network, a block chain network, a privatized non-public network and the like, and realizes the discovery of the correlation among information from diversified information carriers, thereby completing tasks such as clue discovery, time series, character relation series, fund flow direction tracking and the like; the invention aims to realize cross-industry, cross-field and cross-information-source data combined processing, and search certain correlation from various information data which are seemingly irrelevant, so that a behavior route analysis chart from people to events and covering time, place, behavior and fund, and a knowledge base formed by the relation chart among multiple factors are constructed, and the invention can be applied to various business scenes such as national security, criminal investigation, financial management and control and the like.
The embodiment of the invention provides a clue monitoring system based on diversified data fusion. The system comprises: the account entity information extraction module is used for extracting account entity information of diversified data from various data sources; the translation module is used for performing multi-language translation based on cloud service on the extracted various account entity information; the clue finding module is used for analyzing the content texts corresponding to the various types of account entity information to obtain clues and labeling the content containing the clues; the block chain transaction analysis module is used for analyzing and screening transaction data in a block chain network to obtain illegal transaction information; the clue analysis module is used for performing correlation analysis on the various account entity information, clues with labels and illegal transaction information to obtain a network space clue; wherein the cyber-spatial cue comprises a plurality of transaction items and an associative relationship between the plurality of transaction items; the plurality of transaction items are respectively: transaction time, transaction mode, transaction funds, and flow of funds.
In practical application, the system may be divided into functional modules, and a block diagram of the structure is shown in fig. 11. The system functions are described in detail below in conjunction with fig. 11. As shown in fig. 11, the thread monitoring system 110 comprises: the system comprises a data acquisition subsystem 1101, a block link point synchronization subsystem 1102, a transaction data analysis subsystem 1103, a data processing and application subsystem 1104, a data storage subsystem 1104 and a data storage subsystem 1105.
The data collection subsystem 1101 is used for collecting various data meta-information. By way of example, the data acquisition subsystem 1101 may include the following modules: the system comprises an internet news data acquisition module, a wechat public number data acquisition module, an internet forum data acquisition module, an overseas social media data acquisition module, a microblog data acquisition module, an internet element search data acquisition module, a darknet data acquisition module, a domain name currency domain name extraction module, a Telegram group data acquisition module, a bit website content acquisition module, a WhatsApp group data acquisition module, an Ethern trade postscript and intelligent contract acquisition module. The data acquisition module is respectively used for acquiring information or data of different information sources in the method embodiment.
The block link point synchronization subsystem 1102 may include: the system comprises a domain name coin node synchronization module, an Ether house node synchronization module and a bit coin node synchronization module.
The transaction data parsing sub-system 1103 may include: the system comprises an Ethernet shop transaction data analysis module and a bitcoin transaction data analysis module.
The data processing and application subsystem 1104 may include: the system comprises an entity information extraction module, a picture character recognition module, a content analysis and clue discovery module, a multi-language text content translation module, a natural language processing module, a diversified data association analysis module and a WEB application module.
Data storage subsystem 1105 may include: a distributed data storage module and a distributed full-text retrieval module. The data storage subsystem 1105 is used to store the collected information or data from various data sources and the data from the full text search library as described in the previous embodiments.
Compared with the prior art, the system of the embodiment of the invention integrates data correlation analysis methods of heterogeneous information such as a light network, a dark network, a block chain network, a privatized non-public network and the like, and realizes the discovery of the correlation among information from diversified information carriers, thereby completing tasks such as clue discovery, time series, character relation series, fund flow direction tracking and the like; the invention aims to realize cross-industry, cross-field and cross-information-source data combined processing, and search certain correlation from various information data which are seemingly irrelevant, so that a behavior route analysis chart from people to events and covering time, place, behavior and fund, and a knowledge base formed by the relation chart among multiple factors are constructed, and the invention can be applied to various business scenes such as national security, criminal investigation, financial management and control and the like.
A third embodiment of the present invention provides a computer-readable storage medium for storing a computer-readable program, where the computer-readable program is used for an apparatus to execute some or all of the above method embodiments.
That is, those skilled in the art can understand that all or part of the steps in the method according to the above embodiments may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (9)

1. A thread monitoring method based on diversified data fusion is characterized by comprising the following steps:
account entity information of diversified data is extracted from multiple data sources, and cloud service-based multi-language translation is carried out on the extracted various account entity information;
analyzing content texts corresponding to the various types of account entity information to obtain clues and labeling the content containing the clues;
analyzing and screening transaction data in the block chain network to obtain illegal transaction information;
performing correlation analysis on the various types of account entity information, clues with labels and illegal transaction information to obtain a network space clue; wherein the cyber-spatial cues comprise: a plurality of transaction items and an incidence relation among the transaction items; the plurality of transaction items are respectively: transaction time, transaction mode, transaction funds, and flow of funds.
2. The method of claim 1, wherein the plurality of data sources comprises: at least two of the bright web information, the dark web information and the application data of the privatized instant messaging tool; the account entity information includes: the publisher account number, the block chain address, the person name, the place name, the organization name and the article name; the open network information comprises: open web news, open web forums, open web social media; the darknet information comprises: darknet articles, darknet forums; the privatized instant messenger application data comprises: telegram group talk, WhatsApp group talk.
3. The method of claim 1, wherein the extracting account entity information of the diversified data from the plurality of data sources comprises:
collecting group information of a plurality of data sources and storing the group information in a database;
processing the collected group information, removing interference information, and supplementing basic data corresponding to the group information; wherein the base data comprises: source, account, type, and standard time;
setting a timer for each data source according to the data increment of each data source, wherein the timer is used for periodically extracting account entity information of newly acquired data;
for each data source, when reaching each extraction starting time, randomly reading a preset number of new acquisition information from the database, and putting the new acquisition information into a thread pool for processing;
reading one piece of information for each thread in the thread pool, sequentially reading regular expressions of various account numbers, matching the read information, and if the matching is successful, acquiring the matched entity information of the account numbers and returning;
the validity of the successfully matched bit currency wallet address is verified; respectively verifying whether the bit coin wallet address beginning with bc1 is a valid Bech32 format character string or a valid Base58 format character string, and if so, determining that the bit coin wallet address is valid account entity information;
for the Ethenhouse wallet address, if the Ethenhouse wallet address has the length of 40 and can be converted into decimal data after characters at the beginning of 0x are removed, the Ethenhouse wallet address is determined to be effective;
and warehousing the extracted account entity information.
4. The method of claim 3, wherein performing the cloud service based multi-language translation on the extracted entity information of each type comprises:
reading information to be translated from a database, pushing the information to be translated to a translation module through a remote process calling interface, and sequentially carrying out noise filtration and target language identification on the information to be translated by the translation module; if the target language information is judged to be the target language information, directly returning the original data, and if the target language information is judged to be the non-target language information, forwarding the non-target language information to a cloud service translation engine interface for translation to obtain a corresponding target language translation;
and writing the target language translation returned by the cloud service translation engine into a field corresponding to the original text in the database table, and simultaneously writing the original text, the target language translation and other related information into a full-text index library.
5. The method according to claim 1, wherein the analyzing the content texts corresponding to the various types of account entity information to obtain clues and labeling the content containing the clues specifically comprises:
obtaining a thread to be retrieved, and constructing a detection model according to the thread to be retrieved;
the method comprises the steps of reading topics, attention accounts and attention groups set by a user in a database in a classified mode, and retrieving information from a full-text index library according to the topics, the attention accounts and the attention groups;
for the retrieved information, extracting characteristic information by adopting a multi-language deep natural language understanding technology; wherein the feature information includes: keywords, abstracts, identifying topic categories, moods, representative content, representative perspectives;
constructing an entity relationship and an event story line according to the characteristic information, and performing overall analysis from multiple dimensions to provide event global information; wherein the multiple dimensions comprise time, regions and fields;
and writing the processed information into a database.
6. The method of claim 1, wherein the blockchain network comprises an Ethernet fab network,
correspondingly, analyzing and screening the transaction data in the blockchain network to obtain illegal transaction information, which specifically comprises the following steps:
reading block data from local Ether house nodes according to the sequence of block numbers, reading each transaction data in the blocks in sequence, and writing the detailed information of each transaction into an independently stored Ether house transaction database; the data block is synchronized to the local from the Internet open node in advance, and the detailed information of each transaction comprises: a block number, a block hash, a transaction index, a transaction timestamp, an output wallet address, an input wallet address, an ethernet transaction amount, gas, a gas price, a disturbance parameter, and a transaction epilogue;
writing the newly found Ether house wallet address in the entity extraction result into an Ether house key wallet address table;
sequentially taking out the purse addresses of the Ether houses from the key purse address table of the Ether houses, retrieving new transactions related to the local purse from the transaction database of the Ether houses, and writing the new transactions into the database table;
the blockchain network comprises a bitcoin network;
correspondingly, analyzing and screening the transaction data in the blockchain network to obtain illegal transaction information, which specifically comprises the following steps:
reading block data from local bitcoin nodes according to the sequence of block numbers, sequentially reading each transaction data in the blocks, and writing detailed information of each transaction into an independently stored bitcoin transaction database; the bit currency transaction block data is synchronized to the local from the internet open node in advance, and each bit currency transaction data comprises: block height, block hash, transaction hash, block timestamp, serial number n in transaction, transaction wallet address, transaction direction, previous transaction hash, bitcoin transaction amount;
writing the newly found bit coin wallet address in the entity extraction result into a bit coin key wallet address table;
and sequentially taking out the addresses of the bit currency wallets from the bit currency key wallet address table, retrieving new transactions related to the local wallets from the bit currency transaction database, and writing the new transactions into the database table.
7. The method according to claim 1, wherein performing correlation analysis on the various types of account entity information, the clue with the tag, and the illegal transaction information to obtain a cyberspace clue specifically comprises:
and (3) propagation traceability analysis: preprocessing the acquired information and generating a SimHash code which can represent the core content of texts and images and has robustness on part quantity change; forming a release time sequence of the same content information through matching and iterative query of SimHash, thereby tracing an information source and constructing a propagation path;
and (3) account traceability investigation:
for multiple types of accounts, performing association search by combining multiple types of data; wherein the multiple types of accounts include: social media, communication groups, user accounts of the hidden network, communication accounts of WhatsApp and Telegram in the information, a mailbox and a digital token account; the multi-class data comprises: internet, social media, social work repository;
single account portrait analysis:
acquiring historical postings and social relations of the multiple types of accounts; the multiple types of accounts comprise: social media, instant messaging, hidden network theory jar account;
analyzing activity characteristic information of the multiple types of accounts, wherein the activity characteristic information comprises: possible geographic locations, activity periods, affiliations, topics of interest, representative utterances;
deducing account characteristics of the multiple types of accounts according to the activity characteristic information; the account characteristics include: identity, political inclination, risk point;
analyzing the community to which the user belongs, whether the user is a main account of the community, and whether the user is a main association account or not and friend account information;
detecting communities and key people:
aiming at a specified theme or group, acquiring an account group related to social media and an instant messaging tool, constructing a social association relationship through reprinting, the same group, Follow-up and the like, detecting communities hidden in the account group, communicated with each other and leading the theme or group, and detecting leading key account numbers in the communities;
group characteristic portrait:
analyzing the characteristic images of the individual accounts of the customer-defined groups and the detected communities to form the characteristics of the groups; characteristics of the population include possible geographic locations, activity periods, affiliations, topics of interest, representative speech, possible identities, political trends, risk points;
and (3) overall situation perception:
and counting the collected information sources, the collected information, the monitored clues and the detected communities according to time, regions, sources, clue labels and the like to form the overall situation of the social media monitoring target.
8. The method according to claim 1, wherein the WEB application for monitoring the cyberspace cue specifically includes one or any combination of the following:
any one or combination of topic-based information monitoring, account or group-based monitoring, and wallet address-based monitoring;
wherein the topic-based information monitoring comprises:
acquiring retrieval input information of a user; wherein the retrieving the input information comprises: a keyword or a combination of keywords;
performing information retrieval according to the retrieval input information;
if the retrieval result meets the condition, generating a monitoring subject, monitoring the monitoring subject within preset time, and finding text information meeting the subject filtering condition and an input source of the information from the original collected information;
associating with the extracted account entity information to form an association relation between a clue and the account;
if the associated account entity information contains bitcoin or Etheng wallet address, taking the time point of original text information release as a filtering condition, and acquiring all transaction records related to the wallet after the time point from a transaction database;
forming a transaction chain table, and displaying data in a form or flow chart mode, wherein the transaction chain table comprises a monitored wallet address, transfer amount and time, a related wallet address of transfer flow direction and a transaction chain of a postknot;
associating the wallet address with a specific account in combination with other account entities extracted from the text;
the monitoring based on the account or the group specifically includes:
acquiring account information or group information input by a user;
searching according to the account information or the group information and the various search items;
if the retrieval result meets the preset condition, generating a monitoring target account or a target group; monitoring the target account or the target group within a preset time, and filtering and taking out information which is issued by the target account or the target group and accords with a preset clue label from an information source to which the target account or the target group belongs;
associating with the extracted account entity information to form an association relation between the target account or group and other accounts;
if the associated account entity information contains bitcoin or Etheng wallet address, taking the time point of original text information release as a filtering condition, and acquiring all transaction records related to the wallet after the time point from a transaction database;
forming a transaction linked list, and displaying data in a form or flow graph mode, wherein the transaction linked list comprises a monitored wallet address, transfer amount and time, a related wallet address of transfer flow direction and a transaction chain of a later knot;
associating the wallet address with a specific account in combination with other account entity information extracted from the text;
the wallet address-based monitoring specifically comprises:
acquiring a bitcoin wallet address or Etheng wallet address information input by a user, and retrieving the bitcoin wallet address or Etheng wallet address information from a key wallet information table and a full wallet address library;
if the retrieval result meets the preset condition, generating a monitoring target wallet, monitoring the wallet within the preset time, and associating the wallet with information in the entity extraction information table, other account entities and the information text;
acquiring all transaction records related to the wallet from a bitcoin or Etheng wallet transaction database to which the wallet address belongs;
forming a transaction linked list, and displaying data in a form or flow graph mode, wherein the transaction linked list comprises a monitored wallet address, transfer amount and time, a related wallet address of transfer flow direction and a transaction chain of a later knot;
and associating the wallet address with a specific account in combination with other account entity information extracted from the text.
9. A thread monitoring system based on diversified data fusion, comprising:
the account entity information extraction module is used for extracting account entity information of diversified data from various data sources;
the translation module is used for performing multi-language translation based on cloud service on the extracted various account entity information;
the clue finding module is used for analyzing the content texts corresponding to the various types of account entity information to obtain clues and labeling the content containing the clues;
the block chain transaction analysis module is used for analyzing and screening transaction data in a block chain network to obtain illegal transaction information;
the clue analysis module is used for performing correlation analysis on the various account entity information, clues with labels and illegal transaction information to obtain a network space clue; wherein the cyber-spatial cue comprises a plurality of transaction items and an associative relationship between the plurality of transaction items; the plurality of transaction items are respectively: transaction time, transaction mode, transaction funds, and flow of funds.
CN202110240775.6A 2021-03-04 2021-03-04 Thread monitoring method and system based on diversified data fusion Active CN113268649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110240775.6A CN113268649B (en) 2021-03-04 2021-03-04 Thread monitoring method and system based on diversified data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110240775.6A CN113268649B (en) 2021-03-04 2021-03-04 Thread monitoring method and system based on diversified data fusion

Publications (2)

Publication Number Publication Date
CN113268649A true CN113268649A (en) 2021-08-17
CN113268649B CN113268649B (en) 2023-12-19

Family

ID=77228232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110240775.6A Active CN113268649B (en) 2021-03-04 2021-03-04 Thread monitoring method and system based on diversified data fusion

Country Status (1)

Country Link
CN (1) CN113268649B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114390035A (en) * 2022-01-12 2022-04-22 国家计算机网络与信息安全管理中心陕西分中心 Situation perception system for intelligent contract application of Ether house
CN114676243A (en) * 2022-05-25 2022-06-28 成都无糖信息技术有限公司 User portrait analysis method and system for social text
CN115238688A (en) * 2022-08-15 2022-10-25 广州市刑事科学技术研究所 Electronic information data association relation analysis method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300227A1 (en) * 2015-04-13 2016-10-13 Ciena Corporation Systems and methods for tracking, predicting, and mitigating advanced persistent threats in networks
US20180165758A1 (en) * 2016-12-09 2018-06-14 Cognitive Scale, Inc. Providing Financial-Related, Blockchain-Associated Cognitive Insights Using Blockchains
CN108804084A (en) * 2018-05-23 2018-11-13 夏文斌 A kind of overall situation block chain link border construction method
CN109087079A (en) * 2018-07-09 2018-12-25 北京知帆科技有限公司 Digital cash Transaction Information analysis method
CN110119469A (en) * 2019-05-22 2019-08-13 北京计算机技术及应用研究所 A kind of data collection and transmission and method towards darknet
US10380594B1 (en) * 2018-08-27 2019-08-13 Beam Solutions, Inc. Systems and methods for monitoring and analyzing financial transactions on public distributed ledgers for suspicious and/or criminal activity
US20190370797A1 (en) * 2018-05-31 2019-12-05 CipherTrace, Inc. Systems and Methods for Crypto Currency Automated Transaction Flow Detection
CN111047448A (en) * 2019-12-30 2020-04-21 国家计算机网络与信息安全管理中心 Analysis method and device for multi-channel data fusion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300227A1 (en) * 2015-04-13 2016-10-13 Ciena Corporation Systems and methods for tracking, predicting, and mitigating advanced persistent threats in networks
US20180165758A1 (en) * 2016-12-09 2018-06-14 Cognitive Scale, Inc. Providing Financial-Related, Blockchain-Associated Cognitive Insights Using Blockchains
CN108804084A (en) * 2018-05-23 2018-11-13 夏文斌 A kind of overall situation block chain link border construction method
US20190370797A1 (en) * 2018-05-31 2019-12-05 CipherTrace, Inc. Systems and Methods for Crypto Currency Automated Transaction Flow Detection
CN109087079A (en) * 2018-07-09 2018-12-25 北京知帆科技有限公司 Digital cash Transaction Information analysis method
US10380594B1 (en) * 2018-08-27 2019-08-13 Beam Solutions, Inc. Systems and methods for monitoring and analyzing financial transactions on public distributed ledgers for suspicious and/or criminal activity
CN110119469A (en) * 2019-05-22 2019-08-13 北京计算机技术及应用研究所 A kind of data collection and transmission and method towards darknet
CN111047448A (en) * 2019-12-30 2020-04-21 国家计算机网络与信息安全管理中心 Analysis method and device for multi-channel data fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
于焕焕: ""基于机器学习的暗网威胁情报分析"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 02, pages 139 - 146 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114390035A (en) * 2022-01-12 2022-04-22 国家计算机网络与信息安全管理中心陕西分中心 Situation perception system for intelligent contract application of Ether house
CN114676243A (en) * 2022-05-25 2022-06-28 成都无糖信息技术有限公司 User portrait analysis method and system for social text
CN114676243B (en) * 2022-05-25 2022-08-19 成都无糖信息技术有限公司 User portrait analysis method and system for social text
CN115238688A (en) * 2022-08-15 2022-10-25 广州市刑事科学技术研究所 Electronic information data association relation analysis method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113268649B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
Bozarth et al. Toward a better performance evaluation framework for fake news classification
CN113268649B (en) Thread monitoring method and system based on diversified data fusion
CN105095211B (en) The acquisition methods and device of multi-medium data
AU2013261007B2 (en) System and method for creating structured event objects
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN112165462A (en) Attack prediction method and device based on portrait, electronic equipment and storage medium
CN114915468B (en) Intelligent analysis and detection method for network crime based on knowledge graph
CN114547254B (en) Risk identification method and server based on big data topic analysis
CN110880142A (en) Risk entity acquisition method and device
Bach et al. Big data text mining in the financial sector
CN112925899B (en) Ordering model establishment method, case clue recommendation method, device and medium
US20160358087A1 (en) Generating hypotheses in data sets
Paraschiv et al. A unified graph-based approach to disinformation detection using contextual and semantic relations
CN107948312B (en) Information classification and release method and system with position points as information access ports
CN117390299A (en) Interpretable false news detection method based on graph evidence
CN112363996A (en) Method, system, and medium for building a physical model of a power grid knowledge graph
CN116823410A (en) Data processing method, object processing method, recommending method and computing device
CN116723005A (en) Method and system for tracking malicious code implicit information under polymorphic hiding
Korovesis et al. Leveraging aspect-based sentiment prediction with textual features and document metadata
Ahmad et al. Google maps data analysis of clothing brands in south punjab, pakistan
Khan et al. Exploring Links between Online Activism and Real‐World Events: A Case Study of the# FeesMustFall
CN113220843A (en) Method, device, storage medium and equipment for determining information association relation
Sakib et al. Automated detection of sockpuppet accounts in wikipedia
Kumar et al. Analysis of Deep Learning-Based Approaches for Spam Bots and Cyberbullying Detection in Online Social Networks
Kotevska et al. Automatic Categorization of Social Sensor Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant