CN107665224A - Scan the mthods, systems and devices of HDFS cold datas - Google Patents

Scan the mthods, systems and devices of HDFS cold datas Download PDF

Info

Publication number
CN107665224A
CN107665224A CN201610620101.8A CN201610620101A CN107665224A CN 107665224 A CN107665224 A CN 107665224A CN 201610620101 A CN201610620101 A CN 201610620101A CN 107665224 A CN107665224 A CN 107665224A
Authority
CN
China
Prior art keywords
metadata
real
time
data
metadata information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610620101.8A
Other languages
Chinese (zh)
Other versions
CN107665224B (en
Inventor
王永光
王哲涵
唐尚文
张瑜标
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201610620101.8A priority Critical patent/CN107665224B/en
Publication of CN107665224A publication Critical patent/CN107665224A/en
Application granted granted Critical
Publication of CN107665224B publication Critical patent/CN107665224B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of mthods, systems and devices of scanning HDFS cold datas, wherein, methods described includes:Metadata information is exported from metadata node, based on data;To the metadata information real-time streaming in metadata node, obtain new metadata information, and real-time metadata information to be scanned is merged into the basic data real-time incremental;According to the pre-defined rule scanning metadata information to be scanned in real time, so as to draw cold data in real time.The system includes basic data acquisition module, real time data streaming module, tadata memory module and real-time computing module.The present invention, by metadata streaming, exports metadata information real-time incremental when obtaining metadata, thus reduces when metadata exports to the pressure of server;Also, real time scan metadata of the present invention, greatly improve the promptness of cold data discovery.

Description

Scan the mthods, systems and devices of HDFS cold datas
Technical field
The present invention relates to big data processing technology field, specifically, is related to a kind of scanning HDFS (Hadoop Distributed File System, Hadoop distributed file system) cold data mthods, systems and devices.
Background technology
In distributed file storage system HDFS, quantity of documents is huge.In general, cold data (the number being of little use According to) more than the 70% of total files can be accounted for.The presence of a large amount of cold datas, cause the storage of storage system, access pressure.
The theory structure block diagram of HDFS cold data systems is scanned in the prior art.In HDFS meta data server frameworks, NameNode (A) is to provide the key node externally serviced, and alternatively referred to as one of metadata node, its major function is management text The metadata information of part.Wherein, described metadata information contains the bibliographic structure and attribute information of file (folder), also has text Map information of part and its position etc., such as filename, backup number, block number are according to this and node data information.In order to accelerate The metadata of file is typically stored in internal memory by the access of metadata, NameNode (A), but simultaneously also preserves these information Onto hard disk, carry out persistent storage and form metadata image file.In addition, by the modification operation note of metadata in operation day In will (EditLog), Operation Log is typically stored in JournalNode (Operation Log node).NameNode (S) is NameNode (A) secondary node, to ensure the safety of metadata.NameNode (S) is read from JournalNode EditLog, its metadata is changed accordingly according to EditLog, with ensure in NameNode (S) metadata with Metadata in NameNode (A) is consistent.
In the prior art, in order to scan HDFS cold datas,
The content of the invention
The technical problem to be solved in the present invention is, in view of the shortcomings of the prior art, there is provided one kind scanning HDFS cold datas Mthods, systems and devices, solve the meta data server pressure when exporting metadata it is excessive, scanning metadata obtain cold data The problem of promptness is poor.
In order to solve upper technical problem, the invention provides a kind of method of scanning HDFS cold datas, wherein, including it is following Step:
Metadata information is exported from metadata node, based on data;
To the metadata information real-time streaming in metadata node, obtain new metadata information real-time incremental, and Real-time metadata information to be scanned is merged into the basic data;
According to the pre-defined rule scanning metadata information to be scanned in real time, so as to draw cold data in real time.
Preferably, it is described to metadata information real-time streaming, the step of obtaining new metadata information real-time incremental Including:
By Operation Log node, obtain Operation Log real-time incremental;
By playing back the Operation Log, restore and the metadata identical metadata mirror image in metadata node.
Preferably, described the step of metadata information is exported from metadata node, includes:
The metadata mirror image in metadata node is parsed, obtains newest metadata information, and export the newest metadata Information.
Wherein, the metadata information of data and new metadata information are believed including the last operation time based on Breath.
Preferably, it is described according to the pre-defined rule scanning metadata information to be scanned in real time, so as to draw cold number in real time According to the step of include:
According to the cold data time segment information of setting, the last operation time of the scanning metadata information to be scanned in real time Information, when the last operation temporal information of the metadata to be scanned in real time was located in the cold data period of the setting, The metadata to be scanned in real time is cold data.
Preferably, after obtaining new metadata information in real-time incremental, in addition to:
The new metadata information is sent to message queue,
New metadata information in the message queue is increased in the basic data.
Present invention also offers a kind of system of scanning HDFS cold datas, wherein, including:
Basic data acquisition module, for exporting metadata information from metadata node, based on data;
Real time data streaming module, obtains new metadata information for real-time incremental;
Tadata memory module, for merging the new metadata information for storing the basic data and obtaining in real time, carry For real-time metadata information to be scanned;With
Real-time computing module, for according to the pre-defined rule scanning metadata information to be scanned in real time, being obtained so as to real-time Go out cold data.
Preferably, the real time data streaming module includes:
Operation Log acquiring unit, obtain Operation Log from Operation Log node for real-time incremental;With
Reduction unit, by playing back the Operation Log got, obtain identical with the metadata in metadata node Metadata mirror image.
Preferably, the computing module in real time includes:
Reading unit, for reading real-time metadata information to be scanned;
Comparison unit, for the last operation temporal information of metadata information to be scanned in real time and predetermined cold Data time segment information;With
Judging unit, according to comparing result, in the last operation temporal information position of the metadata information to be scanned in real time When in the predetermined cold data period, the metadata to be scanned in real time is defined as cold data.
Preferably, the computing module in real time also includes:
Parameter configuration unit, for configuring cold data time segment information, contrasting foundation is provided for the comparison unit.
Preferably, the system also includes message desk, and the message desk includes message queue;The real-time stream The new metadata information of acquisition is sent to the message queue in the message desk by formula module, will by the message queue The new metadata information is sent to the tadata memory module, or the tadata memory module is from the message queue It is middle to read the new metadata information.
Present invention also offers a kind of device of scanning HDFS cold datas, wherein, including first memory and the first processing Device, the first memory are used for data storage and instruction, and the first processor configures as follows according to the instruction:
The metadata mirror image in metadata node is parsed, obtains newest metadata information, and export the newest metadata Information;
To the metadata information real-time streaming in metadata node, obtain new metadata information real-time incremental.
Preferably, in the device of above-mentioned scanning HDFS cold datas, the first processor is being configured to believe metadata Real-time streaming is ceased, when obtaining new metadata information, concrete configuration includes real-time incremental:
By Operation Log node, obtain Operation Log real-time incremental;
By playing back the Operation Log, restore and the metadata identical metadata mirror image in metadata node.
Present invention also offers the device of another scanning HDFS cold datas, including second memory and second processor, The second memory is used for data storage and instruction, and the second processor configures as follows according to the instruction:
Reception metadata information is exported from metadata node, based on data;
The new metadata information of the real-time streaming to metadata node is received, and is merged into the basic data Real-time metadata information to be scanned;
According to the pre-defined rule scanning metadata information to be scanned in real time, so as to draw cold data in real time.
Preferably, in the device of above-mentioned scanning HDFS cold datas, the second processor is being configured to according to pre- set pattern The metadata information to be scanned in real time is then scanned, so as to which when drawing cold data in real time, concrete configuration is as follows:
According to the cold data time segment information of setting, the last operation time of the scanning metadata information to be scanned in real time Information, when the last operation temporal information of the metadata to be scanned in real time was located in the cold data period of the setting, The metadata to be scanned in real time is cold data.
The present invention by metadata streaming, the export metadata information of real-time incremental, thus reduces when obtaining metadata To the pressure of server during metadata export;Also, metadata information is linked into real time computation system by the present invention, greatly Improve the promptness of cold data discovery.
Brief description of the drawings
By referring to description of the following drawings to the embodiment of the present invention, above-mentioned and other purpose of the invention, feature and Advantage will be apparent from, in the accompanying drawings:
Fig. 1 is the theory structure block diagram for scanning HDFS cold datas in the prior art;
Fig. 2 is the theory structure block diagram of the embodiment of present invention scanning HDFS cold datas one;
Fig. 3 is the outline flowchart of present invention scanning HDFS cold data methods;
Fig. 4 is the structure principle chart of present invention scanning HDFS cold data systems;
Fig. 5 is the structure principle chart of present invention scanning HDFS cold data system embodiments;
Fig. 6 is the structure principle chart of present invention scanning HDFS cold datas device one;With
Fig. 7 is the structure principle chart of present invention scanning HDFS cold datas device two.
Embodiment
Below based on embodiment, present invention is described, but the present invention is not restricted to these embodiments.Under It is detailed to describe some specific detail sections in the literary detailed description to the present invention.Do not have for a person skilled in the art The description of these detail sections can also understand the present invention completely.In order to avoid obscuring the essence of the present invention, known method, mistake Journey, flow do not describe in detail.What other accompanying drawing was not necessarily drawn to scale.
Flow chart, block diagram in accompanying drawing illustrate the possible system frame of the system of the embodiment of the present invention, method, apparatus Frame, function and operation, the square frame on flow chart and block diagram can represent a module, program segment or only one section of code, institute It is all the executable instruction for realizing regulation logic function to state module, program segment and code.It should also be noted that described realize rule Determining the executable instruction of logic function can reconfigure, so as to generate new module and program segment.Therefore accompanying drawing square frame with And square frame order is used only to the process and step of preferably diagram embodiment, without should be in this, as the limit to invention itself System.
As shown in Fig. 2 the theory structure block diagram for the embodiment of present invention scanning HDFS cold datas one;In the present embodiment, The HDFS meta data servers framework is same as the prior art, including metadata node NameNode (A), backup metadata section Point NameNode (S) and Operation Log node JournalNode.Metadata node NameNode (A) is by Operation Log EditLog Store in Operation Log node JournalNode, backup metadata node NameNode (S) is read from JournalNode EditLog, its metadata is changed accordingly according to EditLog, with ensure in NameNode (S) metadata with Metadata in NameNode (A) is consistent.New metadata is obtained by Operation Log node JournalNode real-time incrementals Information, and the new metadata information is sent in the message queue in message desk, system is calculated in real time so as to be sent to System calculates cold data in real time.
In the present invention, the process of new metadata information is obtained in real time by Operation Log node JournalNode, The referred to as streaming of data, you can to obtain newly-increased metadata information continuously, in real time.
With reference to Fig. 3, Fig. 3 is the outline flowchart of present invention scanning HDFS cold data methods, to scanning provided by the invention HDFS cold data methods are described as follows:
Step S1, exports metadata information from metadata node, based on data.Specifically, metadata section is parsed Metadata mirror image in point, newest metadata information is obtained, export the newest metadata information, and these metadata are stored Get up, basic data is provided for real time computation system.This process operation once, i.e., is disposably obtained by this step Basic data.
Step S2, to the metadata information real-time streaming in metadata node, obtain new metadata real-time incremental Information, and merge into real-time metadata information to be scanned with the basic data.Specifically, by Operation Log node, in real time Incrementally obtain Operation Log;By playing back the Operation Log, restore and the metadata identical member in metadata node Data image.Specifically, the playback refers to, read operation daily record one by one, obtains operating time and corresponding operation, i.e., new Metadata information.
In this step, the streaming of metadata information is carried out in real time, i.e., disposably obtained by step S1 After whole metadata, it is not necessary to repeat and obtain whole metadata informations, need to only obtain the metadata information of change.Due to The operation log recording that is stored in the Operation Log node JournalNode operation information of metadata, such as when do What operation., can be to learn which metadata is changed, then by returning by the acquisition Operation Log of real-time incremental formula The metadata identical metadata mirror image that can be restored with metadata node is put, therefore, it is possible to obtain new member in real time Data message.
Step S3, according to the pre-defined rule scanning metadata information to be scanned in real time, so as to draw cold data in real time.Its In, based on data metadata information and new metadata information all include last operation temporal information, thus, according to setting Fixed cold data time segment information, the last operation temporal information of the metadata information to be scanned in real time is scanned, described in contrast Whether the last operation temporal information of real-time metadata information to be scanned is within the cold data period of the setting, if position In it is cold data then to illustrate the metadata.Then the cold data is taken out.The process of the step 3 is by the real-time meter in Fig. 2 Calculation system is completed, the new metadata information that will be restored by JN interfaces (i.e. Operation Log node JournalNode interfaces) It is sent to the real time computation system.As one embodiment, as shown in Fig. 2 first the new metadata information can be sent out It is sent in the message queue in message desk, the new metadata information is sent to the meter in real time by the message desk Calculation system, or by the real time computation system actively inquire about from the message queue in the message desk and obtain it is described newly Metadata information.
According to the principle and method of above-mentioned scanning HDFS cold datas, what it is the invention provides a kind of scanning HDFS cold datas is System, its structural principle are as shown in Figure 4.Specifically include:Basic data acquisition module 1, real time data streaming module 2, metadata Memory module 3 and real-time computing module 4.Wherein, the basic data acquisition module 1 is from metadata node NameNode (A) Export metadata information, based on data;The real time data streaming module 2 obtains new first number with being used for real-time incremental It is believed that breath;The tadata memory module 3 is used to merge the new metadata information for storing the basic data and obtaining in real time, Metadata information to be scanned in real time is provided;The computing module 4 in real time is according to pre-defined rule scanning first number to be scanned in real time It is believed that breath, so as to draw cold data in real time.
Specifically, as shown in figure 5, the real time data streaming module 2 includes Operation Log acquiring unit 21 and reduction Unit 22, the Operation Log acquiring unit 21 obtain Operation Log with being used for real-time incremental from Operation Log node;It is described Reduction unit 22 is obtained and the metadata identical metadata in metadata node by playing back the Operation Log got Mirror image.
4 pieces of the mould that calculates in real time includes reading unit 41, comparison unit 42 and judging unit 43.Wherein, the reading Unit 41 is used to read metadata information to be scanned in real time from tadata memory module 3;The comparison unit 42 is used to compare The last operation temporal information of metadata information to be scanned and the predetermined cold data time segment information in real time;It is described to judge list Member 43 according to comparing result, the metadata information to be scanned in real time last operation temporal information positioned at described predetermined cold When in data time section, the metadata to be scanned in real time is defined as cold data.In order to configure cold data time segment information, also Dispensing unit 44 can be included, by the dispensing unit 44, configure described cold data time segment information, be comparison unit right Than when foundation is provided.
In addition, in one embodiment, can also include message desk, the message desk includes message queue.It is described The new metadata information of acquisition is sent to the message queue in the message desk by real time data streaming module 2, by institute State message queue and the new metadata information is sent to the tadata memory module 3, or the tadata memory module 3 The new metadata information is read from the message queue.
Present invention also offers a kind of device one of scanning HDFS cold datas, as shown in fig. 6, including first memory 100 With first processor 101, the first memory 100 is used for data storage and instruction, and the first processor 101 is according to described Instruction configuration is as follows:
The metadata mirror image in metadata node is parsed, obtains newest metadata information, and export the newest metadata Information;
To the metadata information real-time streaming in metadata node, obtain new metadata information real-time incremental.Tool Body, by Operation Log node, obtain Operation Log real-time incremental;By playing back the Operation Log, restore and member Metadata identical metadata mirror image in back end.
Apparatus above one is located in HDFS meta data servers, for obtaining the metadata information of scanning.
Present invention also offers the device two of another scanning HDFS cold data, as shown in fig. 7, comprises second memory 200 and second processor 201, the second memory 200 be used for data storage and instruction, the basis of second processor 201 The instruction configuration is as follows:
Reception metadata information is exported from metadata node, based on data;
The new metadata information of the real-time streaming to metadata node is received, and is merged into the basic data Real-time metadata information to be scanned;
According to the pre-defined rule scanning metadata information to be scanned in real time, so as to draw cold data in real time.Wherein, specifically Configuration is as follows:
According to the cold data time segment information of setting, the last operation time of the scanning metadata information to be scanned in real time Information, when the last operation temporal information of the metadata to be scanned in real time was located in the cold data period of the setting, The metadata to be scanned in real time is cold data.
Apparatus above two correspond to shown in Fig. 2 in real time computation system, anchor is swept for complete metadata, so as in real time Obtain cold data.
The present invention by metadata streaming, the export metadata information of real-time incremental, thus reduces when obtaining metadata To the pressure of server during metadata export;Also, metadata information is linked into real time computation system by the present invention, greatly Improve the promptness of cold data discovery.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for those skilled in the art For, the present invention can have various changes and change.All any modifications made within spirit and principles of the present invention, it is equal Replace, improve etc., it should be included in the scope of the protection.

Claims (15)

1. a kind of method of scanning HDFS cold datas, wherein, comprise the following steps:
Metadata information is exported from metadata node, based on data;
To the metadata information real-time streaming in metadata node, obtain new metadata information real-time incremental, and with institute State basic data and merge into real-time metadata information to be scanned;
According to the pre-defined rule scanning metadata information to be scanned in real time, so as to draw cold data in real time.
2. the method for scanning HDFS cold datas as claimed in claim 1, wherein, it is described to metadata information real-time streaming, The step of obtaining new metadata information includes real-time incremental:
By Operation Log node, obtain Operation Log real-time incremental;
By playing back the Operation Log, restore and the metadata identical metadata mirror image in metadata node.
3. the method for scanning HDFS cold datas as claimed in claim 1 or 2, wherein, it is described that member is exported from metadata node The step of data message, includes:
The metadata mirror image in metadata node is parsed, obtains newest metadata information, and export the newest metadata information.
4. as claimed in claim 3 scanning HDFS cold datas method, wherein, based on data the metadata information Include last operation temporal information with new metadata information.
5. the method for scanning HDFS cold datas as claimed in claim 4, wherein, it is described described in real time according to pre-defined rule scanning Metadata information to be scanned, so as to which the step of drawing cold data in real time includes:
According to the cold data time segment information of setting, the last operation time for scanning the metadata information to be scanned in real time believes Breath, when the last operation temporal information of the metadata to be scanned in real time was located in the cold data period of the setting, institute It is cold data to state real-time metadata to be scanned.
6. the method for scanning HDFS cold datas as claimed in claim 1, wherein, obtain new metadata letter in real-time incremental After breath, in addition to:
The new metadata information is sent to message queue;
New metadata information in the message queue is increased in the basic data.
7. a kind of system of scanning HDFS cold datas, wherein, including:
Basic data acquisition module, for exporting metadata information from metadata node, based on data;
Real time data streaming module, obtains new metadata information for real-time incremental;
Tadata memory module, for merging the new metadata information for storing the basic data and obtaining in real time, there is provided real When metadata information to be scanned;With
Real-time computing module, it is cold so as to draw in real time for scanning the metadata information to be scanned in real time according to pre-defined rule Data.
8. the system of scanning HDFS cold datas as claimed in claim 7, wherein, the real time data streaming module includes:
Operation Log acquiring unit, obtain Operation Log from Operation Log node for real-time incremental;With
Reduction unit, by playing back the Operation Log got, obtain and the metadata identical member in metadata node Data image.
9. the system of scanning HDFS cold datas as claimed in claim 7, wherein, the computing module in real time includes:
Reading unit, for reading real-time metadata information to be scanned;
Comparison unit, for the last operation temporal information of metadata information to be scanned and the predetermined cold data in real time Time segment information;With
Judging unit, according to comparing result, it is located at institute in the last operation temporal information of the metadata information to be scanned in real time When stating in the predetermined cold data period, the metadata to be scanned in real time is defined as cold data.
10. the system of scanning HDFS cold datas as claimed in claim 9, wherein, the computing module in real time also includes:
Parameter configuration unit, for configuring cold data time segment information, contrasting foundation is provided for the comparison unit.
11. the system of scanning HDFS cold datas as claimed in claim 7, wherein, in addition to message desk, the message desk Including message queue;The new metadata information of acquisition is sent in the message desk by the real time data streaming module Message queue, the new metadata information is sent to the tadata memory module by the message queue, or described Tadata memory module reads the new metadata information from the message queue.
12. a kind of device of scanning HDFS cold datas, wherein, including first memory and first processor, first storage Device is used for data storage and instruction, and the first processor configures as follows according to the instruction:
The metadata mirror image in metadata node is parsed, obtains newest metadata information, and export the newest metadata information;
To the metadata information real-time streaming in metadata node, obtain new metadata information real-time incremental.
13. the device of scanning HDFS cold datas as claimed in claim 12, wherein, the first processor is being configured to member Data message real-time streaming, when obtaining new metadata information, concrete configuration includes real-time incremental:
By Operation Log node, obtain Operation Log real-time incremental;
By playing back the Operation Log, restore and the metadata identical metadata mirror image in metadata node.
14. a kind of device of scanning HDFS cold datas, wherein, including second memory and second processor, second storage Device is used for data storage and instruction, and the second processor configures as follows according to the instruction:
Reception metadata information is exported from metadata node, based on data;
The new metadata information of the real-time streaming to metadata node is received, and is merged into real time with the basic data Metadata information to be scanned;
According to the pre-defined rule scanning metadata information to be scanned in real time, so as to draw cold data in real time.
15. such as the device of scanning HDFS cold datas as claimed in claim 14, wherein, the second processor be configured to by According to the pre-defined rule scanning metadata information to be scanned in real time, so as to which when drawing cold data in real time, concrete configuration is as follows:
According to the cold data time segment information of setting, the last operation time for scanning the metadata information to be scanned in real time believes Breath, when the last operation temporal information of the metadata to be scanned in real time was located in the cold data period of the setting, institute It is cold data to state real-time metadata to be scanned.
CN201610620101.8A 2016-07-29 2016-07-29 Method, system and device for scanning HDFS cold data Active CN107665224B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610620101.8A CN107665224B (en) 2016-07-29 2016-07-29 Method, system and device for scanning HDFS cold data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610620101.8A CN107665224B (en) 2016-07-29 2016-07-29 Method, system and device for scanning HDFS cold data

Publications (2)

Publication Number Publication Date
CN107665224A true CN107665224A (en) 2018-02-06
CN107665224B CN107665224B (en) 2021-04-30

Family

ID=61122020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610620101.8A Active CN107665224B (en) 2016-07-29 2016-07-29 Method, system and device for scanning HDFS cold data

Country Status (1)

Country Link
CN (1) CN107665224B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918911A (en) * 2019-03-18 2019-06-21 北京升鑫网络科技有限公司 A kind of scan method and equipment of mirror image installation package informatin
CN113760854A (en) * 2021-09-10 2021-12-07 北京金山云网络技术有限公司 Method for identifying data in HDFS memory and related equipment
CN113760855A (en) * 2021-09-10 2021-12-07 北京金山云网络技术有限公司 Data storage method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050097126A1 (en) * 2000-08-24 2005-05-05 Microsoft Corporation Partial migration of an object to another storage location in a computer system
CN101101563A (en) * 2007-07-23 2008-01-09 清华大学 Migration management based on massive data classified memory system
CN102449975A (en) * 2009-04-09 2012-05-09 诺基亚公司 Systems, methods, and apparatuses for media file streaming
CN103064902A (en) * 2012-12-18 2013-04-24 厦门市美亚柏科信息股份有限公司 Method and device for storing and reading data in hadoop distributed file system (HDFS)
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)
CN105051696A (en) * 2013-01-10 2015-11-11 网络流逻辑公司 An improved streaming method and system for processing network metadata

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050097126A1 (en) * 2000-08-24 2005-05-05 Microsoft Corporation Partial migration of an object to another storage location in a computer system
CN101101563A (en) * 2007-07-23 2008-01-09 清华大学 Migration management based on massive data classified memory system
CN102449975A (en) * 2009-04-09 2012-05-09 诺基亚公司 Systems, methods, and apparatuses for media file streaming
CN103064902A (en) * 2012-12-18 2013-04-24 厦门市美亚柏科信息股份有限公司 Method and device for storing and reading data in hadoop distributed file system (HDFS)
CN105051696A (en) * 2013-01-10 2015-11-11 网络流逻辑公司 An improved streaming method and system for processing network metadata
CN104572357A (en) * 2014-12-30 2015-04-29 清华大学 Backup and recovery method for HDFS (Hadoop distributed filesystem)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918911A (en) * 2019-03-18 2019-06-21 北京升鑫网络科技有限公司 A kind of scan method and equipment of mirror image installation package informatin
CN109918911B (en) * 2019-03-18 2020-11-03 北京升鑫网络科技有限公司 Method and equipment for scanning mirror image installation package information
CN113760854A (en) * 2021-09-10 2021-12-07 北京金山云网络技术有限公司 Method for identifying data in HDFS memory and related equipment
CN113760855A (en) * 2021-09-10 2021-12-07 北京金山云网络技术有限公司 Data storage method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN107665224B (en) 2021-04-30

Similar Documents

Publication Publication Date Title
US20170212781A1 (en) Parallel execution of blockchain transactions
CN109034993A (en) Account checking method, equipment, system and computer readable storage medium
US11238312B2 (en) Automatically generating labeled synthetic documents
US20140222766A1 (en) System and method for database migration and validation
US9817564B2 (en) Managing a display of content based on user interaction topic and topic vectors
CN103414759A (en) Network disc file transmission method and network disc file transmission device
CN105446825B (en) Database testing method and device
US20140156603A1 (en) Method and an apparatus for splitting and recovering data in a power system
Gupta et al. Faster as well as early measurements from big data predictive analytics model
CN107665224A (en) Scan the mthods, systems and devices of HDFS cold datas
CN103491185A (en) Remote sensing data cloud storage method based on image block organization
Silva et al. Integrating big data into the computing curricula
CN105095247A (en) Symbolic data analysis method and system
CN104965835B (en) A kind of file read/write method and device of distributed file system
CN104778182A (en) Data import method and system based on HBase (Hadoop Database)
CN107273462B (en) Full-text index method for building HBase cluster, data reading method and data writing method
US11853284B2 (en) In-place updates with concurrent reads in a decomposed state
CN112084190A (en) Big data based acquired data real-time storage and management system and method
CN110413631A (en) A kind of data query method and device
Lee et al. Implementation and performance of distributed text processing system using hadoop for e-discovery cloud service
CN111125018A (en) File exception tracing method, device, equipment and storage medium
CN106909623B (en) A kind of data set and date storage method for supporting efficient mass data to analyze and retrieve
CN110019056A (en) Container separated from meta-data for cloud layer
WO2014093395A1 (en) Consuming content incrementally
CN106777357B (en) Method for asynchronously constructing HBase full-text index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant