CN112905575A - Data acquisition method, system, storage medium and electronic equipment - Google Patents

Data acquisition method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN112905575A
CN112905575A CN202110210824.1A CN202110210824A CN112905575A CN 112905575 A CN112905575 A CN 112905575A CN 202110210824 A CN202110210824 A CN 202110210824A CN 112905575 A CN112905575 A CN 112905575A
Authority
CN
China
Prior art keywords
data
bloom filter
data acquisition
unit
encrypted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110210824.1A
Other languages
Chinese (zh)
Inventor
董世永
王世鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chuangsheng Shilian Digital Technology Beijing Co Ltd
Original Assignee
Chuangsheng Shilian Digital Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chuangsheng Shilian Digital Technology Beijing Co Ltd filed Critical Chuangsheng Shilian Digital Technology Beijing Co Ltd
Publication of CN112905575A publication Critical patent/CN112905575A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Abstract

The invention discloses a data acquisition method, a data acquisition system, a storage medium and electronic equipment, wherein the method comprises the following steps: accessing a data source, and performing duplicate removal and filtration on the acquired data by using a bloom filter; compressing the filtered data and then encrypting; and storing the encrypted data to the local by using a memory mapping technology, or simultaneously sending the locally stored data to a server or an external service interface through a network. The data acquisition system provided by the invention adopts the data acquisition method, so that the data acquisition and transmission efficiency is further improved while the stability of the collection system is ensured under high flow.

Description

Data acquisition method, system, storage medium and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method, a system, a storage medium, and an electronic device for data acquisition.
Background
With the rapid development of modern business, people gradually become more and more concerned about the business health state and the service performance state of a service, so that data of the service needs to be monitored, and the monitored data source can be from various forms: such as active site reporting or passive log collection.
At the same time, a problem arises: as the number of data sources increases for a service with a slightly high traffic, the amount of data generated by aggregating these service data at the same time becomes non-trivial, and how to deal with data collection with a large amount of data is a problem that needs to be solved urgently.
Disclosure of Invention
The invention aims to provide a data acquisition method, a data acquisition system, a storage medium and electronic equipment, which can ensure the stability of a collection system and further improve the data acquisition efficiency under high flow.
In order to achieve the above purpose, the invention provides the following technical scheme:
a method of data acquisition, comprising:
accessing a data source, and performing duplicate removal and filtration on the acquired data by using a bloom filter;
compressing the filtered data and then encrypting;
and storing the encrypted data to the local by using a memory mapping technology, or simultaneously sending the locally stored data to a server or an external service interface through a network.
Preferably, the bloom filter is implemented based on a bitmap, the elements of the bit array of the bloom filter defaulting to 0.
Further, the method for performing deduplication filtering on the acquired data by using the bloom filter comprises the following steps:
extracting a unique feature identifier in the data;
inputting the feature identifier into a bloom filter, generating different hash values by using a plurality of hash functions in the bloom filter, setting an element of a digit group corresponding to the hash value as 1, and storing the element;
if the bit array obtained after the data passes through the bloom filter already exists, discarding the data;
and if the bit array obtained after the data passes through the bloom filter does not exist, reserving the data.
Preferably, the data is compressed using a Huffman coding compression algorithm.
Preferably, the compressed data is encrypted using a DES symmetric encryption algorithm.
Preferably, the encrypted data is stored locally using MMAP memory mapping techniques.
Preferably, the locally stored data is sent to the server or external service interface via the network based on sendfile zero copy technology.
A data collection system comprises an acquisition interface unit, a filtering unit, a compression encryption unit, a storage unit and a transmission unit, wherein,
the acquisition interface unit is used for accessing a data source;
the filtering unit is used for carrying out duplicate removal filtering on the acquired data by utilizing a bloom filter;
the compression and encryption unit is used for compressing and encrypting the filtered data;
the storage unit stores the encrypted data to the local by using a memory mapping technology;
the transmission unit is used for transmitting the locally stored data to a server or an external service interface through a network.
A computer readable storage medium having computer readable program instructions stored thereon for performing the above-described data collection method.
An electronic device, the electronic device comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described data collection method.
Compared with the prior art, the data acquisition method, the data acquisition system, the storage medium and the electronic equipment have the following beneficial effects:
according to the data acquisition method provided by the invention, firstly, the acquired data is subjected to duplicate removal filtering by using the bloom filter, and the duplicate removal is processed by skillfully using bit operation, so that the memory consumption of a system is greatly reduced, and then the filtered data is compressed and then is subjected to encryption processing, so that the occupied space of an original file can be reduced, the data acquisition and transmission efficiency is improved, and the data security is improved; the encrypted data is stored locally by using a memory mapping technology for the acquired data, so that the data copying times and the system calling times in the local IO operation process are reduced, the data transmission and storage efficiency is improved, the system performance is stable, and finally the locally stored data is sent to a server or an external service interface through a network.
The data acquisition system provided by the invention adopts the data acquisition method, so that the data acquisition and transmission efficiency is further improved while the stability of the collection system is ensured under high flow.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic flow chart of a data acquisition method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a method for performing deduplication filtering on acquired data according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for obtaining a bit array corresponding to acquired data by using a bloom filter according to an embodiment of the present invention;
FIG. 4 is a block diagram of a system for data collection according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, an embodiment of the invention provides a data acquisition method, including:
accessing a data source, and performing duplicate removal and filtration on the acquired data by using a bloom filter;
compressing the filtered data and then encrypting;
and storing the encrypted data to the local by using a memory mapping technology, or simultaneously sending the locally stored data to a server or an external service interface through a network.
According to the data acquisition method provided by the invention, firstly, the acquired data is subjected to duplicate removal filtering by using the bloom filter, bit operation is skillfully utilized for processing duplicate removal, so that the consumption of a system memory is greatly reduced, then, the data is compressed and encrypted, the occupied space of an original file can be reduced, the data acquisition and transmission efficiency is improved, and the data security is improved; the acquired data is stored by using a memory mapping technology, and the data copying times and the system calling times in the local IO operation process are reduced, so that the data transmission and storage efficiency is improved, and the system performance is stable.
The collection system needs to perform deduplication for the collection source, and in a high-traffic situation, the task of deduplication filtering that carries higher traffic needs to be a troublesome problem while consuming less system resources (e.g., memory) and maintaining good performance. In the traditional deduplication, the whole data is required to be stored in a memory, and then deduplication is compared one by one, so that the system resource consumption is greatly required under high flow. In the method, a bloom filter (BloomFilter) is realized based on a bitmap to deal with a de-duplication filtering task performed under high flow, wherein the initial state of an element of a bit array of the bloom filter is defaulted to be 0. The deduplication is processed by using the bloom filter skillfully through bit operation, so that a large amount of repeated data caused by improper reporting source operation or malicious batch requests is avoided, the consumption of a system memory is greatly reduced, and the system memory consumption is about 100 times lower than that required by a traditional mode.
In short, the bloom filter can be regarded as a bit array, each element in the bit array is 0 or 1, and each element in the bit array only occupies 1bit, so that the memory consumption of the system is greatly reduced. Specifically, referring to fig. 2 or fig. 3, the method for performing deduplication filtering on the acquired data by using a bloom filter includes:
extracting a unique feature identifier in the data;
inputting the feature identification into a bloom filter, generating different hash values by using a plurality of hash functions in the bloom filter, setting an element of a digit group corresponding to the hash value as 1, and storing;
if the bit array obtained after the data passes through the bloom filter already exists, discarding the data;
and if the digit array obtained after the data passes through the bloom filter does not exist, the digit array is reserved.
In a specific implementation process, the unique feature identifier may be one of fields such as an ID and a Name of data, the unique feature identifier according to which duplication is removed is extracted from each piece of acquired data to serve as a character string, when the character string obtained by the feature identifier is input into a bloom filter, the character string is generated into different hash values by a plurality of hash functions in the bloom filter, or the hash values may not be obtained, then an element of a subscript of a digit group corresponding to the obtained hash value is set to be 1, an element of the subscript of the digit group corresponding to the hash value which is not obtained is kept in an initial state of 0, the digit group is stored, and each same digit group is stored only once. When the next data passes through the index bloom filter, if the obtained bit array already exists, that is, the position of the element 1 in the bit array is completely the same in the previously stored bit array is obtained, the data is indicated to exist before, the data is discarded, and the duplicate removal effect is further realized; if the bit array obtained after the next data passes through the bloom filter does not exist, that is, the position of the element 1 in the bit array is not completely the same in the previously stored bit array, the data is indicated to be absent before, and the data is reserved.
In the concrete implementation process, data is compressed by using a Huffman coding compression algorithm, wherein the Huffman coding is a famous lossless reversible compression algorithm, variable-length binary coding is adopted, character contents with high occurrence frequency in the data are represented by shorter coding binary bits, and character contents with low occurrence frequency are represented by longer coding binary bits, so that data compression is realized. The compression mode can greatly reduce the occupied space of the original file and improve the efficiency of data acquisition and transmission.
Then, the data after compression is encrypted by adopting a DES symmetric encryption algorithm, DES is a typical symmetric encryption algorithm, DES is an encryption algorithm with higher efficiency, so-called symmetric encryption, which means that the same Key, generally called a Session Key, is used for encryption and decryption, and the encryption and decryption process is completed by holding the Key. The compressed file is encrypted, and when the file needs to be used, the file is decompressed and decrypted, so that the data security is improved.
In order to ensure persistent storage of the acquired data, the data needs to be locally stored, and in order to further improve the localized IO transmission efficiency, the embodiment stores the encrypted data locally by using an MMAP memory mapping technology. MMAP is a method for mapping files in a memory, namely, a file or other objects are mapped to an address space of a process, so that the one-to-one mapping relation between a file disk address and a section of virtual address in the virtual address space of the process is realized. After the mapping relation is realized, the process can read and write the memory section by using a pointer mode, and the system can automatically write back the dirty page to the corresponding file disk, so that the operation on the file is completed without calling system calling functions such as read and write, the data copying times and the system calling times in the local IO transmission operation process are reduced, and the data acquisition and transmission efficiency is improved.
After the encrypted data is stored locally by using a memory mapping technology, the locally stored data can be sent to a server or an external service interface through a network based on a sendfile zero-copy technology. When facing the data request access of other external services, the process of network transmission needs to be experienced, in order to improve the efficiency, the sendfile zero-copy technology is used in the implementation, the sendfile zero-copy technology is a linux system calling technology and is used for transmitting data between two file descriptors, the process is completely operated in a kernel, so that data copy between a kernel buffer area and a user buffer area is avoided, the efficiency is high, the system calling times and the data copying times can be effectively reduced, the network IO transmission from the local to an external network is accelerated, and the transmission efficiency in data acquisition is integrally improved.
Example two
Referring to fig. 4, an embodiment of the present invention provides a data collection system, including a collection interface unit, a filtering unit, a compression and encryption unit, a storage unit, and a transmission unit, where the collection interface unit is used to access a data source; the filtering unit is used for carrying out duplicate removal filtering on the acquired data by utilizing a bloom filter; the compression and encryption unit is used for compressing and encrypting the filtered data; the storage unit stores the encrypted data to the local by using a memory mapping technology; the transmission unit is used for transmitting the locally stored data to a server or an external service interface through a network.
By adopting the data collection method provided by the first embodiment of the invention, the data collection system provided by the invention can ensure the stability of the collection system under high flow, and the data collection and transmission efficiency can be further improved. Compared with the prior art, the beneficial effects of the data collection system provided by the embodiment of the present invention are the same as the beneficial effects of the data collection method provided by the first embodiment, and other technical features in the system are the same as those disclosed in the method of the previous embodiment, which are not described herein again.
EXAMPLE III
Embodiments of the present invention provide a computer-readable storage medium having computer-readable program instructions stored thereon for performing the method of data collection in the first embodiment.
The computer readable storage medium provided by the embodiments of the present invention may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or any combination thereof. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present embodiment, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer-readable storage medium may be embodied in an electronic device; or may be present alone without being incorporated into the electronic device.
The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a module does not in some cases constitute a limitation of the unit itself, for example, a filtering unit may also be described as a "unit for deduplicating the acquired data with a bloom filter".
The computer-readable storage medium provided by the invention stores computer-readable program instructions for executing the data collection method, so that the collection system is ensured to be stable under high flow, and the data collection and transmission efficiency is further improved. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided by the embodiment of the present invention are the same as the beneficial effects of the data collection method provided by the first embodiment, and are not described herein again.
Example four
An embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the storage stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method for data collection in the first embodiment.
Referring now to FIG. 5, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, the electronic device may include a processing means (e.g., a central processing unit, a graphic processor, etc.) that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage means into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device, the ROM, and the RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
Generally, the following systems may be connected to the I/O interface: input devices including, for example, touch screens, touch pads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, and the like; output devices including, for example, Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage devices including, for example, magnetic tape, hard disk, etc.; and a communication device. The communication means may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While the figures illustrate an electronic device with various systems, it is to be understood that not all illustrated systems are required to be implemented or provided. More or fewer systems may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means, or installed from a storage means, or installed from a ROM. The computer program, when executed by a processing device, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
By adopting the data acquisition method in the first embodiment, the electronic device provided by the invention can ensure the stability of the collection system and further improve the data acquisition and transmission efficiency under high flow. Compared with the prior art, the beneficial effects of the electronic device provided by the embodiment of the present invention are the same as the beneficial effects of the data acquisition method provided by the first embodiment, and other technical features of the electronic device are the same as those disclosed in the method of the previous embodiment, which are not repeated herein.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the foregoing description of embodiments, the particular features, structures, materials, or characteristics may be combined in any suitable manner in any one or more embodiments or examples.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method of data acquisition, comprising:
accessing a data source, and performing duplicate removal and filtration on the acquired data by using a bloom filter;
compressing the filtered data and then encrypting;
and storing the encrypted data to the local by using a memory mapping technology, or simultaneously sending the locally stored data to a server or an external service interface through a network.
2. The method of data collection according to claim 1, wherein the bloom filter is implemented based on a bitmap, the elements of the bit array of the bloom filter defaulting to 0.
3. The method of data acquisition as claimed in claim 2, wherein the method of de-emphasis filtering the acquired data using a bloom filter comprises:
extracting a unique feature identifier in the data;
inputting the feature identifier into a bloom filter, generating different hash values by using a plurality of hash functions in the bloom filter, setting an element of a digit group corresponding to the hash value as 1, and storing the element;
if the bit array obtained after the data passes through the bloom filter already exists, discarding the data;
and if the bit array obtained after the data passes through the bloom filter does not exist, reserving the data.
4. The method of claim 1, wherein the data is compressed using a huffman code compression algorithm.
5. The method of claim 1, wherein the compressed data is encrypted using a DES symmetric encryption algorithm.
6. The method of claim 1, wherein the encrypted data is stored locally using MMAP memory mapping.
7. The method of data collection according to claim 1, wherein the locally stored data is sent to a server or an external service interface via a network based on sendfile zero copy technology.
8. A data collection system is characterized by comprising an acquisition interface unit, a filtering unit, a compression encryption unit, a storage unit and a transmission unit, wherein,
the acquisition interface unit is used for accessing a data source;
the filtering unit is used for carrying out duplicate removal filtering on the acquired data by utilizing a bloom filter;
the compression and encryption unit is used for compressing and encrypting the filtered data;
the storage unit stores the encrypted data to the local by using a memory mapping technology;
the transmission unit is used for transmitting the locally stored data to a server or an external service interface through a network.
9. A computer readable storage medium having computer readable program instructions stored thereon for performing the method of data collection of any of claims 1 to 7.
10. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of data collection of any one of claims 1 to 7.
CN202110210824.1A 2020-12-30 2021-02-25 Data acquisition method, system, storage medium and electronic equipment Pending CN112905575A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011610986 2020-12-30
CN2020116109866 2020-12-30

Publications (1)

Publication Number Publication Date
CN112905575A true CN112905575A (en) 2021-06-04

Family

ID=76107159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110210824.1A Pending CN112905575A (en) 2020-12-30 2021-02-25 Data acquisition method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112905575A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422142A (en) * 2022-08-22 2022-12-02 北京羽乐创新科技有限公司 Data compression method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101150487A (en) * 2007-11-15 2008-03-26 曙光信息产业(北京)有限公司 A transmission method for zero copy network packet
CN102203773A (en) * 2008-09-19 2011-09-28 甲骨文国际公司 Hash join using collaborative parallel filtering in intelligent storage with offloaded bloom filters
US20140310374A1 (en) * 2011-12-26 2014-10-16 Sk Telecom Co., Ltd. Content transmitting system, method for optimizing network traffic in the system, central control device and local caching device
CN104662556A (en) * 2012-09-28 2015-05-27 阿尔卡特朗讯公司 Secure private database querying with content hiding bloom filters
CN108334520A (en) * 2017-01-19 2018-07-27 北京京东尚科信息技术有限公司 social network data processing method, device, storage medium and electronic equipment
CN109145158A (en) * 2017-06-13 2019-01-04 华为技术有限公司 The processing method and Bloom filter of data in a kind of Bloom filter
CN109445702A (en) * 2018-10-26 2019-03-08 黄淮学院 A kind of piece of grade data deduplication storage
US20190121742A1 (en) * 2017-10-19 2019-04-25 Samsung Electronics Co., Ltd. System and method for identifying hot data and stream in a solid-state drive
CN110941836A (en) * 2019-11-06 2020-03-31 贵州小叮当信息技术有限公司 Distributed vertical crawler method and terminal equipment
CN111709027A (en) * 2020-06-22 2020-09-25 湖南大学 Data storage safety management method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101150487A (en) * 2007-11-15 2008-03-26 曙光信息产业(北京)有限公司 A transmission method for zero copy network packet
CN102203773A (en) * 2008-09-19 2011-09-28 甲骨文国际公司 Hash join using collaborative parallel filtering in intelligent storage with offloaded bloom filters
US20140310374A1 (en) * 2011-12-26 2014-10-16 Sk Telecom Co., Ltd. Content transmitting system, method for optimizing network traffic in the system, central control device and local caching device
CN104662556A (en) * 2012-09-28 2015-05-27 阿尔卡特朗讯公司 Secure private database querying with content hiding bloom filters
CN108334520A (en) * 2017-01-19 2018-07-27 北京京东尚科信息技术有限公司 social network data processing method, device, storage medium and electronic equipment
CN109145158A (en) * 2017-06-13 2019-01-04 华为技术有限公司 The processing method and Bloom filter of data in a kind of Bloom filter
US20190121742A1 (en) * 2017-10-19 2019-04-25 Samsung Electronics Co., Ltd. System and method for identifying hot data and stream in a solid-state drive
CN109445702A (en) * 2018-10-26 2019-03-08 黄淮学院 A kind of piece of grade data deduplication storage
CN110941836A (en) * 2019-11-06 2020-03-31 贵州小叮当信息技术有限公司 Distributed vertical crawler method and terminal equipment
CN111709027A (en) * 2020-06-22 2020-09-25 湖南大学 Data storage safety management method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BO MI ET AL.: "Secure Data De-Duplication Based on Threshold Blind Signature and Bloom Filter in Internet of Things", 《INTERNET-OF-THINGS ATTACKS AND DEFENSES:RECENT ADVANCES AND CHALLENGES》 *
赵晓永: "《面向云计算的数据存储关键技术研究》", 31 December 2014 *
饶文 等: "基于布隆过滤器的海量数据查询技术的优化与应用", 《微型电脑应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422142A (en) * 2022-08-22 2022-12-02 北京羽乐创新科技有限公司 Data compression method and device

Similar Documents

Publication Publication Date Title
JP6521403B2 (en) Efficient data compression and analysis as a service
US10958416B2 (en) Encrypted and compressed data transmission with padding
KR102069940B1 (en) Page-based compressed storage management
CN109597717B (en) Data backup and recovery method and device, electronic equipment and storage medium
CN111857550B (en) Method, apparatus and computer readable medium for data deduplication
CN107302706B (en) Image anti-hotlinking method and device and electronic equipment
CN110609708B (en) Method, apparatus and computer readable medium for data processing
CN110795747A (en) Data encryption storage method, device, equipment and readable storage medium
CN111198859A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112035529A (en) Caching method and device, electronic equipment and computer readable storage medium
CN116166197A (en) Data storage method, system, storage node and computer readable storage medium
CN113553300A (en) File processing method and device, readable medium and electronic equipment
CN111352957A (en) Remote dictionary service optimization method and related equipment
CN112905575A (en) Data acquisition method, system, storage medium and electronic equipment
CN112436943A (en) Request deduplication method, device, equipment and storage medium based on big data
US11838207B2 (en) Systems for session-based routing
WO2023273564A1 (en) Virtual machine memory management method and apparatus, storage medium, and electronic device
CN110545313A (en) message push control method and device and electronic equipment
US9734154B2 (en) Method and apparatus for storing a data file
KR102574280B1 (en) Patching Memory Efficient Software for Application Updates on Computing Devices
US9654140B1 (en) Multi-dimensional run-length encoding
CN111967001A (en) Decoding and coding safety isolation method based on double containers
CN112650722B (en) File processing method and device based on android application program, electronic equipment and medium
CN112668033B (en) Data processing method and device and electronic equipment
CN110545107A (en) data processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210604

RJ01 Rejection of invention patent application after publication