CN111444396B - Big data storage system - Google Patents

Big data storage system Download PDF

Info

Publication number
CN111444396B
CN111444396B CN202010213093.1A CN202010213093A CN111444396B CN 111444396 B CN111444396 B CN 111444396B CN 202010213093 A CN202010213093 A CN 202010213093A CN 111444396 B CN111444396 B CN 111444396B
Authority
CN
China
Prior art keywords
data
module
submodule
unit
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010213093.1A
Other languages
Chinese (zh)
Other versions
CN111444396A (en
Inventor
罗颖
陈恭祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhongsheng Ruida Technology Co ltd
Original Assignee
Shenzhen Zhongsheng Ruida Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhongsheng Ruida Technology Co ltd filed Critical Shenzhen Zhongsheng Ruida Technology Co ltd
Priority to CN202010213093.1A priority Critical patent/CN111444396B/en
Publication of CN111444396A publication Critical patent/CN111444396A/en
Application granted granted Critical
Publication of CN111444396B publication Critical patent/CN111444396B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a big data storage system, which carries out corresponding data state determination processing on original data obtained by being captured from an internet server so as to determine whether the original data is normal state data or abnormal state data, then carries out corresponding screening processing on the original data so as to carry out repair processing and removal processing on the corresponding abnormal state data, thereby avoiding the adverse effect of the abnormal state data existing in the original data on the storage of subsequent data, and can also carry out encryption processing, integration and storage processing on the screened data in sequence so as to finish the storage of the original data, and the big data storage system not only can effectively distinguish and screen the big data, but also can carry out corresponding safe encryption on the data, thereby ensuring the normal and stable operation of the big data storage system.

Description

Big data storage system
Technical Field
The invention relates to the technical field of data storage equipment, in particular to a big data storage system.
Background
With the development of electronic information technology and cloud data processing technology, the data volume of internet data not only shows explosive growth, but also the data types of the internet data are more and more, and the data structure is more and more complex. In order to facilitate subsequent rapid and accurate positioning, searching and calculating processing on the internet data, the internet data needs to be stored in a targeted manner. However, because the data resources of the internet data are huge, in order to adapt to the characteristics of different data resources, different big data storage systems need to be developed to store the data, which not only consumes larger manpower and material resources to research and develop and optimize the corresponding big data storage systems, but also consumes longer time in the process, which cannot adapt to the development and change speed of the internet. Furthermore, the existing big data storage system only focuses on how to store related big data quickly and at the maximum capacity, so that the big data storage system generally only focuses on how to improve the utilization efficiency of storage space, and the safety of data storage in the big data storage system is not effectively improved. It can be seen that the big data storage system in the prior art cannot effectively perform the differentiated screening and the secure encryption on the big data, which seriously affects the normal operation of the big data storage system.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a big data storage system which comprises a data capturing module, a data state determining module, a data screening module, a data encrypting module and a data sorting module; the data capturing module is used for the communication connection of an internet server so as to capture and obtain original data from the internet server; the data state determining module is used for determining whether the original data is in a normal state or not according to the data attribute information of the original data; the data screening module is used for screening the original data according to the corresponding normal or abnormal state of the original data so as to obtain pre-screening data; the data encryption module is used for encrypting the pre-screening data so as to obtain encrypted data; the data sorting module is used for integrating and storing the encrypted data; therefore, the big data storage system carries out corresponding data state determination processing on the original data obtained by grabbing from the internet server, thus determining whether the original data is normal state data or abnormal state data, then carrying out corresponding screening processing on the original data so as to carry out repair processing and removal processing on the corresponding abnormal state data, thereby avoiding the adverse effect of abnormal state data existing in the original data on the storage of subsequent data, and the big data storage system can also sequentially carry out encryption processing, integration and storage processing on the screened data so as to finish the storage of the original data, the big data storage system can not only effectively distinguish and screen big data, but also correspondingly encrypt the data safely, so that the normal and stable work of the big data storage system is guaranteed.
The invention provides a big data storage system, which is characterized in that:
the big data storage system comprises a data capturing module, a data state determining module, a data screening module, a data encrypting module and a data sorting module; wherein the content of the first and second substances,
the data capturing module is used for the communication connection of an internet server so as to capture and obtain original data from the internet server;
the data state determining module is used for determining whether the original data is in a normal state or not according to the data attribute information of the original data;
the data screening module is used for screening the original data according to the corresponding normal or abnormal state of the original data so as to obtain pre-screening data;
the data encryption module is used for encrypting the pre-screening data so as to obtain encrypted data;
the data sorting module is used for integrating and storing the encrypted data;
further, the data capturing module comprises a network joint sub-module, a data transmission induction sub-module, a data synchronization sub-module and a data picking sub-module; wherein the content of the first and second substances,
the network joint sub-module is used for being in communication connection with the internet server so as to receive raw data from the internet server;
the data transmission sensing submodule is used for sensing the data transmission attribute information of the original data received by the network joint submodule;
the data synchronization submodule is used for performing clock synchronization processing on the original data according to the type attribute information of the original data;
the data extraction submodule is used for extracting the original data according to the data transmission attribute information and/or the clock synchronization processing result so as to obtain corresponding original data;
further, the data transmission sensing submodule comprises a data transmission capacity sensing unit and a data transmission rate sensing unit; wherein the content of the first and second substances,
the data transmission capacity sensing unit is used for sensing data transmission capacity information of the original data received by the network joint sub-module to serve as part of the data transmission attribute information;
the data transmission rate sensing unit is used for sensing data transmission rate information of the original data received by the network joint sub-module as part of the data transmission attribute information;
the data synchronization submodule comprises a data structure state determining unit, a data transmission time sequence determining unit and a synchronization executing unit; wherein the content of the first and second substances,
the data structure state determining unit is used for determining the data isomerization state difference of the original data;
the data transmission time sequence determining unit is used for determining time sequence information corresponding to the original data transmitted to the network joint sub-module;
the synchronization execution unit is used for performing the clock synchronization processing on the original data according to the data isomerization state difference and the time sequence information;
further, the data state determining module comprises a data preprocessing submodule, a data characteristic value operator module and a data normality/abnormality judging submodule; wherein the content of the first and second substances,
the data preprocessing submodule is used for preprocessing the original data according to the data attribute information so as to obtain preprocessed data;
the data characteristic value operator module is used for calculating and obtaining a characteristic value corresponding to the preprocessed data;
the data normal/abnormal judgment submodule is used for judging whether the corresponding original data is in a normal state or an abnormal state according to the characteristic value;
further, the data preprocessing submodule comprises a data shelling unit, data filtering processing and data dimension reduction processing; wherein the content of the first and second substances,
the data unshelling unit is used for carrying out data hidden shell removal processing on the original data;
the data filtering processing is used for performing Kalman filtering processing on the original data subjected to the data hiding shell extraction processing;
the data dimension reduction processing is used for carrying out data space dimension reduction processing on the original data subjected to the Kalman filtering processing;
the data characteristic value operator module comprises a data matrix transformation unit and a matrix characteristic value calculation unit; wherein the content of the first and second substances,
the data matrix transformation unit is used for transforming to obtain a data matrix related to the data entropy according to the preprocessed data;
the matrix eigenvalue calculation unit is used for performing calculation processing on a matrix eigenvalue equation on the data matrix so as to obtain the eigenvalue;
the data normality/abnormality judgment submodule comprises a characteristic value comparison unit and a data state determination unit; wherein the content of the first and second substances,
the characteristic value comparison unit is used for comparing the characteristic value with a preset characteristic threshold value;
the data state determining unit is used for determining whether the original data is in a normal state or an abnormal state according to the comparison processing result;
further, the data screening module comprises a data distinguishing sub-module, a data repairing sub-module and a data eliminating sub-module; wherein the content of the first and second substances,
the data distinguishing submodule is used for distinguishing and processing the original data according to the corresponding normal or abnormal state of the original data so as to obtain normal state data and abnormal state data;
the data recovery submodule is used for performing adaptive data recovery processing on the abnormal state data;
the data eliminating submodule is used for eliminating the original data according to the distinguishing processing result and/or the data repairing processing result so as to obtain the pre-screening data;
further, the data distinguishing submodule comprises a data registering unit and a data updating unit; wherein the content of the first and second substances,
the data registering unit is used for respectively registering the normal state data and the abnormal state data obtained by the distinguishing processing;
the updating unit is used for updating the different data registered by the data registering unit according to the distinguishing process progress;
the data repair submodule comprises a data missing state determining unit and a data supplementing unit; wherein the content of the first and second substances,
the data missing state determining unit is used for determining data block missing information of the abnormal state data;
the data supplementing unit is used for performing data restoration processing on the abnormal state data according to the missing information of the data block;
the data removing sub-module is used for removing the original data according to the result of the distinguishing processing and/or the result of the data repairing processing so as to obtain the pre-screening data,
calculating the pre-screening data according to the following formulas (1) to (2)
μi=sgn(DFi*δ(y)) (1)
X={xii=0} (2)
In the above formulas (1) and (2), X is a data set corresponding to the pre-screening data, and X isiFor the ith original data, DFiA discrimination processing result corresponding to the ith original data, and when the discrimination processing result is in a normal state, DFiIs 0, DF when the discrimination processing result is in an abnormal stateiIs 1, and δ (y) is the repair test result, which is expressed in specific mathematical form as δ (y) ═ sgn (DF)y·f(y)),DFy(y) an abnormal data checking function for checking whether the abnormal data is repaired or not, wherein when the value of δ (y) is 0, the abnormal data is repaired, and otherwise the abnormal data is not repaired;
further, the data encryption module comprises a data mining submodule, a data arrangement adjusting submodule and a data packaging submodule; wherein the content of the first and second substances,
the data mining submodule is used for mining the pre-screening data so as to obtain data hierarchical frame information about the pre-screening data;
the data arrangement adjusting submodule is used for carrying out encryption adjustment processing on a data layering frame on the pre-screening data according to the data layering frame information;
the data encapsulation submodule is used for encapsulating the pre-screening data subjected to the encryption adjustment processing;
further, the data sorting module comprises a storage interval determining submodule, a data-storage interval matching submodule and a data compression submodule; wherein the content of the first and second substances,
the storage interval determining submodule is used for determining storage margin information corresponding to different storage sectors in the big data storage system;
the data-storage interval matching sub-module is used for constructing a storage matching relation between the encrypted data and the storage sector according to the storage margin information and the data volume information of the encrypted data;
the data compression submodule is used for compressing and storing the encrypted data into a corresponding storage sector according to the storage matching relation;
further, the data compression submodule comprises a data verification unit and a data simplification unit; wherein the content of the first and second substances,
the data verification unit is used for verifying whether the encrypted data and the corresponding storage sector meet the storage matching relationship;
and the data simplification unit is used for removing the data redundancy from the encrypted data to obtain a corresponding compressed data set.
Compared with the prior art, the big data storage system carries out corresponding data state determination processing on the original data captured from the internet server, thus determining whether the original data is normal state data or abnormal state data, then carrying out corresponding screening processing on the original data so as to carry out repair processing and removal processing on the corresponding abnormal state data, thereby avoiding the adverse effect of abnormal state data existing in the original data on the storage of subsequent data, and the big data storage system can also sequentially carry out encryption processing, integration and storage processing on the screened data so as to finish the storage of the original data, the big data storage system can not only effectively distinguish and screen big data, but also correspondingly encrypt the data safely, so that the normal and stable work of the big data storage system is guaranteed.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a big data storage system according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic structural diagram of a big data storage system according to an embodiment of the present invention. The big data storage system comprises a data capturing module, a data state determining module, a data screening module, a data encrypting module and a data sorting module; wherein the content of the first and second substances,
the data capturing module is used for the communication connection of an internet server so as to capture and obtain original data from the internet server;
the data state determining module is used for determining whether the original data is in a normal state or not according to the data attribute information of the original data;
the data screening module is used for screening the original data according to the corresponding normal or abnormal state of the original data so as to obtain pre-screening data;
the data encryption module is used for encrypting the pre-screening data so as to obtain encrypted data;
the data sorting module is used for integrating and storing the encrypted data.
Preferably, the data capture module comprises a network joint sub-module, a data transmission induction sub-module, a data synchronization sub-module and a data picking sub-module; wherein the content of the first and second substances,
the network joint submodule is used for being in communication connection with the internet server so as to receive the original data from the internet server;
the data transmission sensing submodule is used for sensing the data transmission attribute information of the original data received by the network joint submodule;
the data synchronization submodule is used for performing clock synchronization processing on the original data according to the type attribute information of the original data;
the data extraction sub-module is used for extracting the original data according to the data transmission attribute information and/or the clock synchronization processing result so as to obtain corresponding original data.
Preferably, the data transmission sensing submodule includes a data transmission capacity sensing unit and a data transmission rate sensing unit; wherein the content of the first and second substances,
the data transmission capacity sensing unit is used for sensing data transmission capacity information of the original data received by the network joint submodule to serve as part of the data transmission attribute information;
the data transmission rate sensing unit is used for sensing data transmission rate information of the original data received by the network joint sub-module as part of the data transmission attribute information.
Preferably, the data synchronization submodule comprises a data structure state determination unit, a data transmission timing determination unit and a synchronization execution unit; wherein the content of the first and second substances,
the data structure state determining unit is used for determining the data isomerization state difference of the original data;
the data transmission time sequence determining unit is used for determining time sequence information corresponding to the original data transmitted to the network joint sub-module;
the synchronization execution unit is used for performing the clock synchronization processing on the original data according to the data isomerization state difference and the time sequence information.
Preferably, the data state determining module comprises a data preprocessing submodule, a data characteristic value operator module and a data normality/abnormality judging submodule; wherein the content of the first and second substances,
the data preprocessing submodule is used for preprocessing the original data according to the data attribute information so as to obtain preprocessed data;
the data characteristic value operator module is used for calculating and obtaining a characteristic value corresponding to the preprocessed data;
the data normality/abnormality judgment submodule is used for judging whether the corresponding original data is in a normal state or an abnormal state according to the characteristic value.
Preferably, the data preprocessing submodule comprises a data shelling unit, a data filtering process and a data dimension reduction process; wherein the content of the first and second substances,
the data unshelling unit is used for carrying out data hidden shell removal processing on the original data;
the data filtering processing is used for performing Kalman filtering processing on the original data subjected to the data hidden shell extraction processing;
the data dimension reduction processing is used for carrying out data space dimension reduction processing on the raw data subjected to the Kalman filtering processing.
Preferably, the data eigenvalue operator module comprises a data matrix transformation unit and a matrix eigenvalue calculation unit; wherein the content of the first and second substances,
the data matrix transformation unit is used for transforming to obtain a data matrix related to the data entropy according to the preprocessed data;
the matrix eigenvalue calculation unit is used for performing calculation processing on the data matrix according to a matrix eigenvalue equation so as to obtain the eigenvalue.
Preferably, the data normality/abnormality judgment submodule includes a characteristic value comparison unit and a data state determination unit; wherein the content of the first and second substances,
the characteristic value comparison unit is used for comparing the characteristic value with a preset characteristic threshold value;
the data state determining unit is used for determining whether the original data is in a normal state or an abnormal state according to the result of the comparison processing.
Preferably, the data screening module comprises a data distinguishing sub-module, a data repairing sub-module and a data eliminating sub-module; wherein the content of the first and second substances,
the data distinguishing submodule is used for distinguishing and processing the original data according to the corresponding normal or abnormal state of the original data so as to obtain normal state data and abnormal state data;
the data recovery submodule is used for performing adaptive data recovery processing on the abnormal state data;
the data eliminating submodule is used for eliminating the original data according to the distinguishing processing result and/or the data repairing processing result so as to obtain the pre-screening data.
Preferably, the data distinguishing submodule comprises a data registering unit and a data updating unit; wherein the content of the first and second substances,
the data registering unit is used for respectively registering the normal state data and the abnormal state data obtained by the distinguishing processing;
the updating unit is used for updating the different data registered by the data registering unit according to the distinguishing process progress.
Preferably, the data repair submodule includes a data missing state determination unit and a data supplement unit; wherein the content of the first and second substances,
the data missing state determining unit is used for determining data block missing information of the abnormal state data;
the data supplementing unit is used for performing the data repairing treatment on the abnormal state data according to the missing information of the data block.
Preferably, the data eliminating sub-module is configured to perform eliminating processing on the original data according to the result of the distinguishing processing and/or the result of the data repairing processing, so as to obtain the pre-screening data specifically including,
the prescreening data was calculated according to the following formulas (1) to (2)
μi=sgn(DFi*δ(y)) (1)
X={xii=0} (2)
In the above formulas (1) and (2), X is the data set corresponding to the pre-screening data, and X isiFor the ith original data, DFiA discrimination processing result corresponding to the ith original data, and when the discrimination processing result is in a normal state, DFiIs 0, and DF when the discrimination processing result is in an abnormal stateiIs 1, and δ (y) is the repair test result, which is expressed in specific mathematical form as δ (y) ═ sgn (DF)y·f(y)),DFyFor the result of the differential processing of the abnormal data, f (y) is an abnormal data checking function for checking whether the abnormal data is repaired, and when the value of δ (y) is 0, it indicates that the abnormal data has been repaired, otherwise, it indicates that the abnormal data has not been repaired.
Through the process, the original data can be rejected, abnormal data can be judged, and then pre-screening data can be obtained, the distinguishing and processing result of the original data is used as an important screening basis, the abnormal data which is subjected to repairing processing on the abnormal data and completes abnormal data repairing can not be rejected, the data which is abnormal in distinguishing result and does not complete abnormal data repairing is rejected, and other data pass through pre-screening.
Preferably, the data encryption module comprises a data mining submodule, a data arrangement adjusting submodule and a data packaging submodule; wherein the content of the first and second substances,
the data mining submodule is used for mining the pre-screening data so as to obtain data hierarchical frame information about the pre-screening data;
the data arrangement adjusting submodule is used for carrying out encryption adjustment processing on the data hierarchical frame on the pre-screening data according to the data hierarchical frame information;
the data encapsulation submodule is used for encapsulating the pre-screening data which is subjected to the encryption adjustment processing.
Preferably, the data sorting module comprises a storage interval determining submodule, a data-storage interval matching submodule and a data compression submodule; wherein the content of the first and second substances,
the storage interval determining submodule is used for determining the storage margin information corresponding to different storage sectors in the big data storage system;
the data-storage interval matching submodule is used for constructing a storage matching relation between the encrypted data and the storage sector according to the storage allowance information and the data volume information of the encrypted data;
the data compression submodule is used for compressing and storing the encrypted data into a corresponding storage sector according to the storage matching relation.
Preferably, the data compression submodule comprises a data verification unit and a data reduction unit; wherein the content of the first and second substances,
the data verification unit is used for verifying whether the encrypted data and the corresponding storage sector meet the storage matching relationship;
the data simplification unit is used for removing the data redundancy of the encrypted data to obtain a corresponding compressed data set.
Preferably, the data compression sub-module may further compress the encrypted data by the following formula (3),
Figure GDA0002981982180000121
in the above formula (3), stFor the t-th compressed data obtained after compression, z is the number of compressed data, n is the number of encrypted data, xkIs k encrypted data;
then, the compressed encrypted data is stored to a matched storage sector according to the corresponding storage matching relation;
through the process, the encrypted data can be compressed to one half to one fourth of the original data, so that the storage can be accelerated during the storage, and the lossless compression is adopted, so that the data information which is the same as that before the compression can be obtained even if the compressed data is reconstructed.
In brief, the purpose of the big data storage system is to capture the original data from the corresponding internet server, and identify whether the data state of the original data is normal or not, screening and eliminating the data in abnormal state to retain the data in normal state, and finally, carrying out security encryption processing on the stored normal data, so as to prevent the normal data from being cracked and stolen, and finally, the encrypted normal data is stored, and in general, the big data storage system can realize the data processing procedures of judging whether the original data is normal or not from a data source, only keeping the normal data and encrypting the normal data, therefore, the data stored in the big data storage system is ensured to be normal and accurate, and the normal and stable work of the big data storage system is effectively ensured.
As can be seen from the above description of the embodiments, the big data storage system performs the corresponding data state determination process on the raw data fetched from the internet server, thus determining whether the original data is normal state data or abnormal state data, then carrying out corresponding screening processing on the original data so as to carry out repair processing and removal processing on the corresponding abnormal state data, thereby avoiding the adverse effect of abnormal state data existing in the original data on the storage of subsequent data, and the big data storage system can also sequentially carry out encryption processing, integration and storage processing on the screened data so as to finish the storage of the original data, the big data storage system can not only effectively distinguish and screen big data, but also correspondingly encrypt the data safely, so that the normal and stable work of the big data storage system is guaranteed.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (4)

1. A big data storage system, comprising:
the big data storage system comprises a data capturing module, a data state determining module, a data screening module, a data encrypting module and a data sorting module; wherein the content of the first and second substances,
the data capturing module is used for the communication connection of an internet server so as to capture and obtain original data from the internet server;
the data state determining module is used for determining whether the original data is in a normal state or not according to the data attribute information of the original data;
the data screening module is used for screening the original data according to the corresponding normal or abnormal state of the original data so as to obtain pre-screening data;
the data encryption module is used for encrypting the pre-screening data so as to obtain encrypted data;
the data sorting module is used for integrating and storing the encrypted data;
the data capturing module comprises a network joint sub-module, a data transmission induction sub-module, a data synchronization sub-module and a data picking sub-module; wherein the content of the first and second substances,
the network joint sub-module is used for being in communication connection with the internet server so as to receive raw data from the internet server;
the data transmission sensing submodule is used for sensing the data transmission attribute information of the original data received by the network joint submodule;
the data synchronization submodule is used for performing clock synchronization processing on the original data according to the type attribute information of the original data;
the data extraction submodule is used for extracting the original data according to the data transmission attribute information and/or the clock synchronization processing result so as to obtain corresponding original data;
the data encryption module comprises a data mining submodule, a data arrangement adjusting submodule and a data packaging submodule; wherein the content of the first and second substances,
the data mining submodule is used for mining the pre-screening data so as to obtain data hierarchical frame information about the pre-screening data;
the data arrangement adjusting submodule is used for carrying out encryption adjustment processing on a data layering frame on the pre-screening data according to the data layering frame information;
the data encapsulation submodule is used for encapsulating the pre-screening data subjected to the encryption adjustment processing;
the data screening module comprises a data distinguishing sub-module, a data repairing sub-module and a data eliminating sub-module; wherein the content of the first and second substances,
the data distinguishing submodule is used for distinguishing and processing the original data according to the corresponding normal or abnormal state of the original data so as to obtain normal state data and abnormal state data;
the data recovery submodule is used for performing adaptive data recovery processing on the abnormal state data;
the data eliminating submodule is used for eliminating the original data according to the distinguishing processing result and/or the data repairing processing result so as to obtain the pre-screening data;
the data distinguishing submodule comprises a data registering unit and a data updating unit; wherein the content of the first and second substances,
the data registering unit is used for respectively registering the normal state data and the abnormal state data obtained by the distinguishing processing;
the updating unit is used for updating the different data registered by the data registering unit according to the distinguishing process progress;
the data repair submodule comprises a data missing state determining unit and a data supplementing unit; wherein the content of the first and second substances,
the data missing state determining unit is used for determining data block missing information of the abnormal state data;
the data supplementing unit is used for performing data restoration processing on the abnormal state data according to the missing information of the data block;
the data removing sub-module is used for removing the original data according to the result of the distinguishing processing and/or the result of the data repairing processing so as to obtain the pre-screening data,
calculating the pre-screening data according to the following formulas (1) to (2)
μi=sgn(DFi*δ(y)) (1)
X={xii=0} (2)
In the above formulas (1) and (2), X is a data set corresponding to the pre-screening data, and X isiFor the ith original data, DFiA discrimination processing result corresponding to the ith original data, and when the discrimination processing result is in a normal state, DFiIs 0, DF when the discrimination processing result is in an abnormal stateiIs 1, and δ (y) is the repair test result, which is expressed in specific mathematical form as δ (y) ═ sgn (DF)y·f(y)),DFy(y) an abnormal data checking function for checking whether the abnormal data is repaired or not, wherein when the value of δ (y) is 0, the abnormal data is repaired, and otherwise the abnormal data is not repaired; the data sorting module comprises a storage interval determining submodule, a data-storage interval matching submodule and a data pressing submoduleA shrinking module; wherein the content of the first and second substances,
the storage interval determining submodule is used for determining storage margin information corresponding to different storage sectors in the big data storage system;
the data-storage interval matching sub-module is used for constructing a storage matching relation between the encrypted data and the storage sector according to the storage margin information and the data volume information of the encrypted data;
the data compression submodule is used for compressing and storing the encrypted data into a corresponding storage sector according to the storage matching relation;
the data compression submodule comprises a data verification unit and a data simplification unit; wherein the content of the first and second substances,
the data verification unit is used for verifying whether the encrypted data and the corresponding storage sector meet the storage matching relationship;
the data simplification unit is used for removing the data redundancy from the encrypted data to obtain a corresponding compressed data set;
the data compression sub-module compresses the encrypted data by the following formula (3),
Figure FDA0002981982170000041
in the above formula (3), stFor the t-th compressed data obtained after compression, z is the number of compressed data, n is the number of encrypted data, xkIs k encrypted data;
and then, storing the compressed encrypted data into the matched storage sector according to the corresponding storage matching relation.
2. The big data storage system of claim 1, wherein:
the data transmission sensing submodule comprises a data transmission capacity sensing unit and a data transmission rate sensing unit; wherein the content of the first and second substances,
the data transmission capacity sensing unit is used for sensing data transmission capacity information of the original data received by the network joint sub-module to serve as part of the data transmission attribute information;
the data transmission rate sensing unit is used for sensing data transmission rate information of the original data received by the network joint sub-module as part of the data transmission attribute information;
the data synchronization submodule comprises a data structure state determining unit, a data transmission time sequence determining unit and a synchronization executing unit; wherein the content of the first and second substances,
the data structure state determining unit is used for determining the data isomerization state difference of the original data;
the data transmission time sequence determining unit is used for determining time sequence information corresponding to the original data transmitted to the network joint sub-module;
and the synchronization execution unit is used for performing the clock synchronization processing on the original data according to the data isomerization state difference and the time sequence information.
3. The big data storage system of claim 1, wherein:
the data state determining module comprises a data preprocessing submodule, a data characteristic value operator module and a data normal/abnormal judging submodule; wherein the content of the first and second substances,
the data preprocessing submodule is used for preprocessing the original data according to the data attribute information so as to obtain preprocessed data;
the data characteristic value operator module is used for calculating and obtaining a characteristic value corresponding to the preprocessed data;
and the data normal/abnormal judgment submodule is used for judging whether the corresponding original data is in a normal state or an abnormal state according to the characteristic value.
4. The big data storage system of claim 3, wherein:
the data preprocessing submodule comprises a data unshelling unit, data filtering processing and data dimension reduction processing; wherein the content of the first and second substances,
the data unshelling unit is used for carrying out data hidden shell removal processing on the original data;
the data filtering processing is used for performing Kalman filtering processing on the original data subjected to the data hiding shell extraction processing;
the data dimension reduction processing is used for carrying out data space dimension reduction processing on the original data subjected to the Kalman filtering processing;
the data characteristic value operator module comprises a data matrix transformation unit and a matrix characteristic value calculation unit; wherein the content of the first and second substances,
the data matrix transformation unit is used for transforming to obtain a data matrix related to the data entropy according to the preprocessed data;
the matrix eigenvalue calculation unit is used for performing calculation processing on a matrix eigenvalue equation on the data matrix so as to obtain the eigenvalue;
the data normality/abnormality judgment submodule comprises a characteristic value comparison unit and a data state determination unit; wherein the content of the first and second substances,
the characteristic value comparison unit is used for comparing the characteristic value with a preset characteristic threshold value;
the data state determining unit is used for determining whether the original data is in a normal state or an abnormal state according to the comparison processing result.
CN202010213093.1A 2020-03-24 2020-03-24 Big data storage system Active CN111444396B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010213093.1A CN111444396B (en) 2020-03-24 2020-03-24 Big data storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010213093.1A CN111444396B (en) 2020-03-24 2020-03-24 Big data storage system

Publications (2)

Publication Number Publication Date
CN111444396A CN111444396A (en) 2020-07-24
CN111444396B true CN111444396B (en) 2021-06-01

Family

ID=71629501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010213093.1A Active CN111444396B (en) 2020-03-24 2020-03-24 Big data storage system

Country Status (1)

Country Link
CN (1) CN111444396B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114971525A (en) * 2022-04-20 2022-08-30 西华大学 Carbon neutralization management system and method based on block chain
CN114971818B (en) * 2022-08-02 2023-04-28 广东志远科技有限公司 Intelligent restaurant data storage processing method and system
CN115913769B (en) * 2022-12-20 2023-09-08 海口盛通达投资控股有限责任公司 Data security storage method and system based on artificial intelligence

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461380B (en) * 2014-11-17 2017-11-21 华为技术有限公司 Date storage method and device
CN108491508A (en) * 2018-03-22 2018-09-04 安徽八六物联科技有限公司 A kind of big data cleaning code system
CN109753592A (en) * 2018-12-22 2019-05-14 汤新红 A kind of information flow storage system and its storage method based on big data

Also Published As

Publication number Publication date
CN111444396A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
CN111444396B (en) Big data storage system
CN108521418B (en) Identity authentication method and system fusing block chain and biological feature recognition
Poisel et al. Advanced file carving approaches for multimedia files.
CN104331487A (en) Method and device for processing logs
CN102915594A (en) Bank card security system based on human body biological information code and operation method thereof
CN112217763A (en) Hidden TLS communication flow detection method based on machine learning
CN106886707B (en) Image authentication method and device
CN111177469A (en) Face retrieval method and face retrieval device
CN103279744A (en) Multi-scale tri-mode texture feature-based method and system for detecting counterfeit fingerprints
CN113904861A (en) Encrypted flow security detection method and device
CN105468972B (en) A kind of mobile terminal document detection method
CN105243327B (en) A kind of secure file processing method
CN110851854A (en) Image processing method and device for preventing information leakage
CN117911782A (en) Hidden network traffic classification method and system based on multi-mode fusion
CN111932270A (en) Method and device for bank customer identity verification
CN111553693A (en) Associated certificate storage method and system based on secondary hash
CN103679922B (en) A kind of equipment of depositing and withdrawing of supporting visual currency examine
TW202331562A (en) Biological feature recognition method, server and client
CN114898181A (en) Hidden danger violation identification method and device for explosion-related video
CN113343256A (en) Electronic evidence obtaining and storing system based on block chain technology
CN114493858A (en) Illegal fund transfer suspicious transaction monitoring method and related components
Yan et al. Adaptive local feature based multi-scale image hashing for robust tampering detection
CN113590903B (en) Management method and device of information data
CN117714213B (en) Evidence chain automatic verification method and system based on commercial password evaluation requirements
Guo et al. Research of Multiple-type Files Carving Method Based on Entropy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant