CN104714983B - The generation method and device of distributed index - Google Patents
The generation method and device of distributed index Download PDFInfo
- Publication number
- CN104714983B CN104714983B CN201310695615.6A CN201310695615A CN104714983B CN 104714983 B CN104714983 B CN 104714983B CN 201310695615 A CN201310695615 A CN 201310695615A CN 104714983 B CN104714983 B CN 104714983B
- Authority
- CN
- China
- Prior art keywords
- index database
- reduce operation
- file system
- reduce
- index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of generation method of distributed index and devices, in the above-mentioned methods, the quantity of the map operation in Hadoop are determined according to the data volume of initial data;It will treated that data are distributed to multiple reduce operations by each map operation, and generate index database corresponding with each reduce operation, wherein, the corresponding relationship between the quantity of reduce operation and each reduce operation and one or more map operation is to be pre-configured with completion;Index database corresponding with each reduce operation is merged.The technical solution provided according to the present invention is realized and efficiently, is rapidly indexed to mass data.
Description
Technical field
The present invention relates to the communications fields, in particular to the generation method and device of a kind of distributed index.
Background technique
With the arriving of cloud era, big data (Big data) has also attracted more and more concerns.Big data is usually used
Come describe a company create a large amount of unstructured and semi-structured data, these data downloading to relevant database use
Excessive time and money can be expended when analysis.Big data analysis is often linked together with cloud computing, because large-scale in real time
Data set analysis needs the frame as MapReduce to share out the work to tens of, hundreds of or even thousands of computers.And it is big
Data generally refer to such a phenomenon: the user network that Internet company generates in daily operation, accumulates in internet industry
Network behavioral data.The scale of these data be so it is huge, so that it cannot being measured using G or T.
Does big data have much on earth? only pass through one day time, the full content that internet generates can carve full 1.68 hundred million
Open DVD;The volume of mail of transmission can achieve as many as 294,000,000,000 envelopes;The community post of sending can reach 2,000,000;The hand of sale
Machine is 37.8 ten thousand ...
To 2012, data volume was risen to from TB(1TB=1024GB) rank to PB(1PB=1024TB), EB for cut-off
(1EB=1024PB) or even ZB(1ZB=1024EB) rank.The result of study of International Data Corporation (IDC) (IDC) shows the whole world in 2008
The data volume of generation is 0.49ZB, and the data volume that the whole world in 2009 generates is 0.8ZB, and the data volume that the whole world in 2010 generates increases
Data volume for 1.2ZB, and whole world generation in 2011 is more up to 1.82ZB, and being equivalent to the whole world, everyone generates 200GB's or more
Data.Until 2012, the data volume of all printing materials of human being's production is 200PB, the institute that the whole mankind said in history
The data volume for having words is about 5EB.IBM's studies have shown that 90% was in entire human civilization total data obtained
It goes to generate in two years.And the year two thousand twenty has been arrived, data scale caused by the whole world is up to 44 times of today.
Currently, how fast and effeciently to have searched out user's data of concern from big data in big data era
As increasingly important problem.Efficiently quickly creating index is the premise that user scans for, and is usually adopted in the related technology
The technical solution of creation index is single thread, and when facing mass data, there are performance bottlenecks, due to wanting to system
It asks higher, and the limited system expanding ability, can no longer meet user and fast and effeciently carry out data in mass data
The demand of retrieval.
Summary of the invention
The present invention provides a kind of generation method of distributed index and device, at least solve in the related technology can not be right
The problem of mass data creation efficiently quickly indexes.
According to an aspect of the invention, there is provided a kind of generation method of distributed index.
The generation method of distributed index according to the present invention includes: to be determined in Hadoop according to the data volume of initial data
Mapping (map) operation quantity;It will treated that data distribute makees to multiple specifications (reduce) by each map operation
Industry, and generate index database corresponding with each reduce operation, wherein the quantity of reduce operation and each reduce operation
Corresponding relationship between one or more map operations is to be pre-configured with completion;To corresponding with each reduce operation
Index database merges.
Preferably, generating index database corresponding with each reduce operation includes: the file system for obtaining and currently supporting
Type;The generating mode of index database corresponding with each reduce operation is determined according to the type of file system;According to generation side
Formula generates index database corresponding with each reduce operation.
Preferably, generating index database corresponding with each reduce operation according to generating mode includes: when file system
When type is Hadoop distributed file system (HDFS), index corresponding with each reduce operation is generated in local disk
Then the index database generated in local disk is uploaded to HDFS by library;Alternatively, when file system type be except HDFS it
When outer remaining supports shared distributed file system (DFS), directly support to generate in shared DFS at remaining with it is each
The corresponding index database of reduce operation.
Preferably, merging to index database corresponding with each reduce operation includes: when the type of file system is
When HDFS, the index database corresponding with each reduce operation in HDFS is downloaded to local disk;In local disk pair and often
The corresponding index database of a reduce operation merges;The index database obtained after merging is uploaded to HDFS, and by local disk
In index database corresponding with each reduce operation deleted.
Preferably, merging to index database corresponding with each reduce operation includes: when the type of file system is
When remaining supports shared DFS, remaining is supported the index database corresponding with each reduce operation generated in shared DFS into
Row merges;Remaining is supported the index database corresponding with each reduce operation generated in shared DFS delete.
According to another aspect of the present invention, a kind of generating means of distributed index are provided.
The generating means of distributed index according to the present invention comprise determining that module, for the data according to initial data
Measure the quantity for determining the map operation of the mapping in Hadoop;Generation module, for will be by each map operation treated data
It distributes to multiple specification reduce operations, and generates index database corresponding with each reduce operation, wherein reduce operation
Corresponding relationship between quantity and each reduce operation and one or more map operation is to be pre-configured with completion;It closes
And module, for being merged to index database corresponding with each reduce operation.
Preferably, generation module includes: acquiring unit, for obtaining the type for the file system currently supported;It determines single
Member, for determining the generating mode of index database corresponding with each reduce operation according to the type of file system;Generation unit,
For generating index database corresponding with each reduce operation according to generating mode.
Preferably, generation unit, for when the type of file system be Hadoop distributed file system HDFS when, this
Index database corresponding with each reduce operation is generated in local disk, then uploads the index database generated in local disk
To HDFS;Alternatively, generation unit, is the distribution text that remaining support in addition to HDFS is shared for the type when file system
When part system DFS, directly support to generate index database corresponding with each reduce operation in shared DFS at remaining.
Preferably, merging module includes: download unit, for when the type of file system be HDFS when, will be in HDFS
Index database corresponding with each reduce operation is downloaded to local disk;First combining unit, in local disk pair and often
The corresponding index database of a reduce operation merges;First processing units, the index database for obtaining after merging are uploaded to
HDFS, and the index database corresponding with each reduce operation in local disk is deleted.
Preferably, merging module includes: the second combining unit, is that remaining supports shared for the type when file system
When DFS, the index database corresponding with each reduce operation generated in shared DFS is supported remaining to merge;At second
Unit is managed, for supporting the index database corresponding with each reduce operation generated in shared DFS to delete remaining.
Through the embodiment of the present invention, the quantity of the map operation in Hadoop is determined using the data volume according to initial data;
Will by each map operation, treated that data distribute to multiple reduce operations, and generate corresponding with each reduce operation
Index database, between the quantity of the reduce operation and each reduce operation and one or more map operation it is corresponding pass
System is to be pre-configured with completion;Index database corresponding with each reduce operation is merged, i.e., by using Hadoop
In map operation and reduce operation initial data is handled, generate corresponding with each reduce operation index database, so
Index database corresponding with each reduce operation is merged afterwards, thus solving in the related technology can not create mass data
The problem of efficiently quickly indexing is built, and then realizes and mass data efficiently, is rapidly indexed.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the generation method of distributed index according to an embodiment of the present invention;
Fig. 2 is the flow chart of the generation method of distributed index according to the preferred embodiment of the invention;
Fig. 3 is the structural block diagram of the generating means of distributed index according to an embodiment of the present invention;
Fig. 4 is the structural block diagram of the generating means of distributed index according to the preferred embodiment of the invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
Fig. 1 is the flow chart of the generation method of distributed index according to an embodiment of the present invention.As shown in Figure 1, this method
May include following processing step:
Step S102: the quantity of the map operation in Hadoop is determined according to the data volume of initial data;
Step S104: will by each map operation, treated that data distribute to multiple reduce operations, and generate with it is every
The corresponding index database of a reduce operation, wherein the quantity of reduce operation and each reduce operation and one or more
Corresponding relationship between map operation is to be pre-configured with completion;
Step S106: index database corresponding with each reduce operation is merged.
In the related technology, mass data can not be created and efficiently, is quickly indexed.Using method as shown in Figure 1, pass through
Using in Hadoop map operation and reduce operation initial data is handled, generate it is corresponding with each reduce operation
Index database, then index database corresponding with each reduce operation is merged, thus solves and in the related technology can not
The problem of mass data creation is efficiently quickly indexed, and then realize and mass data efficiently, is rapidly indexed.
Preferably, in step S104, generating index database corresponding with each reduce operation may include following operation:
Step S1: the type for the file system currently supported is obtained;
Step S2: the generating mode of index database corresponding with each reduce operation is determined according to the type of file system;
Step S3: index database corresponding with each reduce operation is generated according to generating mode.
In a preferred embodiment, firstly, it is necessary to determine the size of the data volume of initial data to be obtained, and it is divided into M
(M is positive integer) part, wherein every part of data respectively correspond a map operation.Certainly, data volume handled by each map operation
It can be with dynamic configuration.Map data processing plug-in unit is set as a result,.In addition, in being generated after each map operation processing
Between key-value pair collection credit union be periodically written local disk, it is positive integer that local disk can be divided into N(N again) it is a, N be user from
Definition setting, each subregion respectively corresponds a reduce operation.By configuring the maximum number of reduce operation, to improve
The creation efficiency of distributed index, and inserted according to the setting reduce data processing of the quantity of the reduce operation of user configuration
Part.In the preferred embodiment, creation index can support Hadoop distributed file system (HDFS) and other support
Shared distributed file system (DFS).It therefore, can be poor according to the type for the file system supported in creation Index process
Then the generating mode of different determination index database corresponding with each reduce operation generates and each reduce according to generating mode
The corresponding index database of operation.
Preferably, in step s3, generating index database corresponding with each reduce operation according to generating mode can wrap
Include one of following steps:
Step S31: raw in local disk when the type of file system is Hadoop distributed file system (HDFS)
At index database corresponding with each reduce operation, the index database generated in local disk is then uploaded to HDFS;
Step S32: when the type of file system is remaining distributed file system for supporting to share in addition to HDFS
(DFS) it when, directly supports to generate index database corresponding with each reduce operation in shared DFS at remaining.
In a preferred embodiment, if the type for the file system currently supported is HDFS, each reduce operation
Interim index database is generated in local file system (i.e. local disk);Then, the scale removal process last in reduce operation
In, the interim index database generated in local file system can be uploaded in HDFS file system.If currently supported
The type of file system is that remaining supports shared DFS, then interim index database can be directly generated in DFS file system.
Preferably, in step s 106, index database corresponding with each reduce operation is merged may include with
Lower operation:
Step S4: when the type of file system is HDFS, by the index corresponding with each reduce operation in HDFS
Library is downloaded to local disk;
Step S5: it is merged in local disk pair index database corresponding with each reduce operation;
Step S6: the index database obtained after merging is uploaded to HDFS, and will be made in local disk with each reduce
The corresponding index database of industry is deleted.
In a preferred embodiment, if the type for the file system currently supported is HDFS, first by Hadoop's
It indexes host node (master) and all interim index databases is downloaded to local file system from HDFS file system;Secondly,
The all interim index database in local file system is merged on index host node, generates complete index database;Again, exist
Complete index database is uploaded in HDFS file system on index host node;Then, by local file on index host node
Each interim index database in system is deleted;Finally, the index of Hadoop is from node (slave) from HDFS file system
Complete index database is downloaded in local file system, is used to retrieve.
Preferably, in step s 106, index database corresponding with each reduce operation is merged may include with
Lower step:
Step S7: when the type of file system is that remaining supports shared DFS, remaining is supported raw in shared DFS
At index database corresponding with each reduce operation merge;
Step S8: remaining is supported the index database corresponding with each reduce operation generated in shared DFS delete
It removes.
In a preferred embodiment, if the type for the file system currently supported is that remaining supports shared DFS, first
The interim index database in DFS file system is merged into complete index database by the index host node of Hadoop, is made to retrieve
With;Each interim index database in DFS file system is deleted on index host node again.
Above-mentioned preferred implementation process is further described below in conjunction with preferred embodiment shown in Fig. 2.
Fig. 2 is the flow chart of the generation method of distributed index according to the preferred embodiment of the invention.As shown in Fig. 2, should
Process may include following processing stage:
First stage: data acquisition phase, i.e. the map sessions of Hadoop, data acquisition phase are setting indexes
The preposition preparation stage can provide data for creation index and support.It is distributed used by the map sessions of Hadoop
Implementation, can concurrently handle data, wherein the quantity of map operation needs are dynamically determined by the data volume acquired.
Data are handled using the acquisition text file or database file of the map operation of Hadoop, generate creation index institute
The content of each field (i.e. key-value pair (key, value) is gathered) needed, thus greatly improves data processing performance.And
In acquisition due to supporting plug-in unit processing, different processing modes can be customized according to data volume.
Second stage: creation index stage, i.e. the reduce sessions of Hadoop create distributed index library.Pass through
The number of reduce operation is set to determine the greatest measure reduceNum of reduce job parallelism processing.Rank is acquired in data
The data of Duan Shengcheng distribute specific data by HashCode () %reduceNum to each reduce operation as rope
Draw, each reduce operation generates the interim index library file of itself respectively.
It should be noted that creation index can support Hadoop distributed file system (HDFS) and other support
Shared distributed file system (DFS).
Phase III: index merging phase generates each according to each reduce operation that the creation index stage obtains
Interim index database calls index to merge and each interim index database is merged into a complete index database by index host node.It is holding
When line index merges, each interim index database can be read one by one, interim index database is incorporated into individual master index library, finally
Each interim index database is deleted, and provides retrieval service by master index library.
Fig. 3 is the structural block diagram of the generating means of distributed index according to an embodiment of the present invention.As shown in figure 3, the dress
Setting may include: determining module 10, and the number of the mapping map operation in Hadoop is determined for the data volume according to initial data
Amount;Generation module 20, treated for will pass through each map operation, and data are distributed to multiple specification reduce operations, and raw
At index database corresponding with each reduce operation, wherein the quantity of reduce operation and each reduce operation and one
Or the corresponding relationship between multiple map operations is to be pre-configured with completion;Merging module 30, for making to each reduce
The corresponding index database of industry merges.
Using device as shown in Figure 3, quickly index can not be created efficiently to mass data in the related technology by solving
The problem of, and then realize and mass data efficiently, is rapidly indexed.
Preferably, as shown in figure 4, generation module 20 may include: acquiring unit 200, for obtaining the text currently supported
The type of part system;Determination unit 202, for determining index corresponding with each reduce operation according to the type of file system
The generating mode in library;Generation unit 204, for generating index database corresponding with each reduce operation according to generating mode.
Preferably, as shown in figure 4, generation unit 204, is Hadoop distributed field system for the type when file system
When system HDFS, index database corresponding with each reduce operation is generated in local disk, then will be generated in local disk
Index database be uploaded to HDFS;Alternatively, generation unit 204, is remaining in addition to HDFS for the type when file system
When supporting shared distributed file system DFS, directly support to generate and each reduce operation pair in shared DFS at remaining
The index database answered.
Preferably, as shown in figure 4, merging module 30 may include: download unit 300, for working as the type of file system
When for HDFS, the index database corresponding with each reduce operation in HDFS is downloaded to local disk;First combining unit
302, for being merged in local disk pair index database corresponding with each reduce operation;First processing units 304, are used for
The index database obtained after merging is uploaded to HDFS, and by the index database corresponding with each reduce operation in local disk into
Row is deleted.
Preferably, as shown in figure 4, merging module 30 may include: the second combining unit 306, for when file system
When type is that remaining supports shared DFS, the rope corresponding with each reduce operation generated in shared DFS is supported remaining
Draw library to merge;The second processing unit 308 is generating with each reduce operation in shared DFS for supporting remaining
Corresponding index database is deleted.
It can be seen from the above description that above example implements following technical effect (it should be noted that these
Effect is the effect that certain preferred embodiments can achieve): using technical solution provided by the embodiment of the present invention, can pass through
Initial data is handled using the map-reduce programming model in Hadoop, is generated corresponding with each reduce operation
Then index database merges index database corresponding with each reduce operation, a complete index database is formed, to examine
Rope uses, and thus solves the problems, such as that mass data can not be created efficiently quickly index in the related technology, and then realize
Mass data efficiently, is rapidly indexed.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general
Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed
Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored
It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein
Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or
Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (8)
1. a kind of generation method of distributed index characterized by comprising
The quantity of the mapping map operation in Hadoop is determined according to the data volume of initial data;
Will by each map operation, treated that data distribute to multiple specification reduce operations, and generate and each reduce
The corresponding index database of operation, wherein the quantity of the reduce operation and each reduce operation and one or more
Corresponding relationship between map operation is to be pre-configured with completion;
Index database corresponding with each reduce operation is merged;
Wherein, the class that index database corresponding with each reduce operation includes: the file system that acquisition is currently supported is generated
Type;According to the generating mode of the determining index database corresponding with each reduce operation of the type of the file system;According to
The generating mode generates index database corresponding with each reduce operation.
2. the method according to claim 1, wherein being generated and each reduce according to the generating mode
The corresponding index database of operation includes:
When the type of the file system is Hadoop distributed file system HDFS, generated in local disk and described every
The corresponding index database of a reduce operation, is then uploaded to the HDFS for the index database generated in the local disk;
Alternatively,
When the type of the file system is the distributed file system DFS that remaining support in addition to the HDFS is shared,
Directly it is described remaining support to generate index database corresponding with each reduce operation in shared DFS.
3. according to the method described in claim 2, it is characterized in that, to index database corresponding with each reduce operation into
Row merges
It, will be corresponding with each reduce operation in the HDFS when the type of the file system is the HDFS
Index database is downloaded to the local disk;
It is merged in the local disk pair index database corresponding with each reduce operation;
The index database obtained after merging is uploaded to the HDFS, and will be made in the local disk with each reduce
The corresponding index database of industry is deleted.
4. according to the method described in claim 2, it is characterized in that, to index database corresponding with each reduce operation into
Row merges
It is raw in the DFS shared to remaining support when the type of the file system is the shared DFS of remaining support
At index database corresponding with each reduce operation merge;
Remaining described index database corresponding with each reduce operation for supporting to generate in shared DFS is deleted.
5. a kind of generating means of distributed index characterized by comprising
Determining module determines the quantity of the mapping map operation in Hadoop for the data volume according to initial data;
Generation module, treated for will pass through each map operation, and data are distributed to multiple specification reduce operations, and are generated
Index database corresponding with each reduce operation, wherein the quantity of the reduce operation and each reduce operation
Corresponding relationship between one or more map operations is to be pre-configured with completion;
Merging module, for being merged to index database corresponding with each reduce operation;
Wherein, the generation module includes: acquiring unit, for obtaining the type for the file system currently supported;Determination unit,
For the generating mode according to the determining index database corresponding with each reduce operation of the type of the file system;It generates
Unit, for generating index database corresponding with each reduce operation according to the generating mode.
6. device according to claim 5, which is characterized in that the generation unit, for working as the class of the file system
When type is Hadoop distributed file system HDFS, index corresponding with each reduce operation is generated in local disk
Then the index database generated in the local disk is uploaded to the HDFS by library;Alternatively, the generation unit, is used for
When the type of the file system is the distributed file system DFS that remaining support in addition to the HDFS is shared, directly
It is described remaining support to generate index database corresponding with each reduce operation in shared DFS.
7. device according to claim 6, which is characterized in that the merging module includes:
Download unit, for when the type of the file system is the HDFS, by the HDFS with it is described each
The corresponding index database of reduce operation is downloaded to the local disk;
First combining unit, for being closed in the local disk pair index database corresponding with each reduce operation
And;
First processing units, the index database for obtaining after merging are uploaded to the HDFS, and will be in the local disk
Index database corresponding with each reduce operation is deleted.
8. device according to claim 6, which is characterized in that the merging module includes:
Second combining unit, for when the type of the file system be it is described remaining support shared DFS when, to it is described remaining
The index database corresponding with each reduce operation for supporting to generate in shared DFS merges;
The second processing unit, it is corresponding with each reduce operation for will be generated in the shared DFS of remaining support
Index database deleted.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310695615.6A CN104714983B (en) | 2013-12-17 | 2013-12-17 | The generation method and device of distributed index |
PCT/CN2014/078696 WO2014180411A1 (en) | 2013-12-17 | 2014-05-28 | Distributed index generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310695615.6A CN104714983B (en) | 2013-12-17 | 2013-12-17 | The generation method and device of distributed index |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104714983A CN104714983A (en) | 2015-06-17 |
CN104714983B true CN104714983B (en) | 2019-02-19 |
Family
ID=51866791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310695615.6A Active CN104714983B (en) | 2013-12-17 | 2013-12-17 | The generation method and device of distributed index |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104714983B (en) |
WO (1) | WO2014180411A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354251B (en) * | 2015-10-19 | 2018-10-30 | 国家电网公司 | Electric power cloud data management indexing means based on Hadoop in electric system |
CN105430078B (en) * | 2015-11-17 | 2019-03-15 | 浪潮(北京)电子信息产业有限公司 | A kind of distributed storage method of mass data |
CN105610899B (en) * | 2015-12-10 | 2019-09-24 | 浪潮(北京)电子信息产业有限公司 | A kind of parallel method for uploading of text file and device |
US11216516B2 (en) | 2018-06-08 | 2022-01-04 | At&T Intellectual Property I, L.P. | Method and system for scalable search using microservice and cloud based search with records indexes |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102426609A (en) * | 2011-12-28 | 2012-04-25 | 厦门市美亚柏科信息股份有限公司 | Index generation method and index generation device based on MapReduce programming architecture |
CN102436491A (en) * | 2011-11-08 | 2012-05-02 | 张三明 | System and method used for searching huge amount of pictures and based on BigBase |
CN102479217A (en) * | 2010-11-23 | 2012-05-30 | 腾讯科技(深圳)有限公司 | Method and device for realizing computation balance in distributed data warehouse |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100162230A1 (en) * | 2008-12-24 | 2010-06-24 | Yahoo! Inc. | Distributed computing system for large-scale data handling |
CN102467570B (en) * | 2010-11-17 | 2014-03-12 | 日电(中国)有限公司 | Connection query system and method for distributed data warehouse |
US9361323B2 (en) * | 2011-10-04 | 2016-06-07 | International Business Machines Corporation | Declarative specification of data integration workflows for execution on parallel processing platforms |
US20130151535A1 (en) * | 2011-12-09 | 2013-06-13 | Canon Kabushiki Kaisha | Distributed indexing of data |
CN103246549B (en) * | 2012-02-07 | 2016-12-14 | 阿里巴巴集团控股有限公司 | A kind of method and system of data conversion storage |
CN103440244A (en) * | 2013-07-12 | 2013-12-11 | 广东电子工业研究院有限公司 | Large-data storage and optimization method |
-
2013
- 2013-12-17 CN CN201310695615.6A patent/CN104714983B/en active Active
-
2014
- 2014-05-28 WO PCT/CN2014/078696 patent/WO2014180411A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102479217A (en) * | 2010-11-23 | 2012-05-30 | 腾讯科技(深圳)有限公司 | Method and device for realizing computation balance in distributed data warehouse |
CN102436491A (en) * | 2011-11-08 | 2012-05-02 | 张三明 | System and method used for searching huge amount of pictures and based on BigBase |
CN102426609A (en) * | 2011-12-28 | 2012-04-25 | 厦门市美亚柏科信息股份有限公司 | Index generation method and index generation device based on MapReduce programming architecture |
Also Published As
Publication number | Publication date |
---|---|
WO2014180411A1 (en) | 2014-11-13 |
CN104714983A (en) | 2015-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11216302B2 (en) | Modifying task dependencies at worker nodes using precompiled libraries | |
CN102521416B (en) | Data correlation query method and data correlation query device | |
Yan et al. | Quegel: A general-purpose query-centric framework for querying big graphs | |
Perez et al. | Ringo: Interactive graph analytics on big-memory machines | |
CN104881466B (en) | The processing of data fragmentation and the delet method of garbage files and device | |
CN105900093B (en) | A kind of update method of the tables of data of KeyValue databases and table data update apparatus | |
CN106030573A (en) | Implementation of semi-structured data as a first-class database element | |
CN106970929B (en) | Data import method and device | |
CN107748752B (en) | Data processing method and device | |
CN102982075A (en) | Heterogeneous data source access supporting system and method thereof | |
CN106611037A (en) | Method and device for distributed diagram calculation | |
JP2017507378A (en) | Incremental and concatenated redistribution to extend online shared nothing database | |
CN104714983B (en) | The generation method and device of distributed index | |
CN103246549B (en) | A kind of method and system of data conversion storage | |
CN104111936A (en) | Method and system for querying data | |
JP2014078085A (en) | Execution control program, execution control method and information processor | |
CN108037967A (en) | A kind of menu loading method and electronic equipment based on more parent-child structures | |
US20180095719A1 (en) | Sorted linked list with a midpoint binary tree | |
Tanase et al. | A highly efficient runtime and graph library for large scale graph analytics | |
Hashem et al. | An Integrative Modeling of BigData Processing. | |
Peng et al. | An analysis platform of road traffic management system log data based on distributed storage and parallel computing techniques | |
Baig et al. | Big Data Tools: Advantages and Disadvantages. | |
Haque et al. | Distributed RDF triple store using hbase and hive | |
CN112817930A (en) | Data migration method and device | |
Ma et al. | Efficient attribute-based data access in astronomy analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |