CN104714983B - The generation method and device of distributed index - Google Patents

The generation method and device of distributed index Download PDF

Info

Publication number
CN104714983B
CN104714983B CN201310695615.6A CN201310695615A CN104714983B CN 104714983 B CN104714983 B CN 104714983B CN 201310695615 A CN201310695615 A CN 201310695615A CN 104714983 B CN104714983 B CN 104714983B
Authority
CN
China
Prior art keywords
index database
reduce operation
file system
reduce
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310695615.6A
Other languages
Chinese (zh)
Other versions
CN104714983A (en
Inventor
韩丙卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201310695615.6A priority Critical patent/CN104714983B/en
Priority to PCT/CN2014/078696 priority patent/WO2014180411A1/en
Publication of CN104714983A publication Critical patent/CN104714983A/en
Application granted granted Critical
Publication of CN104714983B publication Critical patent/CN104714983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of generation method of distributed index and devices, in the above-mentioned methods, the quantity of the map operation in Hadoop are determined according to the data volume of initial data;It will treated that data are distributed to multiple reduce operations by each map operation, and generate index database corresponding with each reduce operation, wherein, the corresponding relationship between the quantity of reduce operation and each reduce operation and one or more map operation is to be pre-configured with completion;Index database corresponding with each reduce operation is merged.The technical solution provided according to the present invention is realized and efficiently, is rapidly indexed to mass data.

Description

The generation method and device of distributed index
Technical field
The present invention relates to the communications fields, in particular to the generation method and device of a kind of distributed index.
Background technique
With the arriving of cloud era, big data (Big data) has also attracted more and more concerns.Big data is usually used Come describe a company create a large amount of unstructured and semi-structured data, these data downloading to relevant database use Excessive time and money can be expended when analysis.Big data analysis is often linked together with cloud computing, because large-scale in real time Data set analysis needs the frame as MapReduce to share out the work to tens of, hundreds of or even thousands of computers.And it is big Data generally refer to such a phenomenon: the user network that Internet company generates in daily operation, accumulates in internet industry Network behavioral data.The scale of these data be so it is huge, so that it cannot being measured using G or T.
Does big data have much on earth? only pass through one day time, the full content that internet generates can carve full 1.68 hundred million Open DVD;The volume of mail of transmission can achieve as many as 294,000,000,000 envelopes;The community post of sending can reach 2,000,000;The hand of sale Machine is 37.8 ten thousand ...
To 2012, data volume was risen to from TB(1TB=1024GB) rank to PB(1PB=1024TB), EB for cut-off (1EB=1024PB) or even ZB(1ZB=1024EB) rank.The result of study of International Data Corporation (IDC) (IDC) shows the whole world in 2008 The data volume of generation is 0.49ZB, and the data volume that the whole world in 2009 generates is 0.8ZB, and the data volume that the whole world in 2010 generates increases Data volume for 1.2ZB, and whole world generation in 2011 is more up to 1.82ZB, and being equivalent to the whole world, everyone generates 200GB's or more Data.Until 2012, the data volume of all printing materials of human being's production is 200PB, the institute that the whole mankind said in history The data volume for having words is about 5EB.IBM's studies have shown that 90% was in entire human civilization total data obtained It goes to generate in two years.And the year two thousand twenty has been arrived, data scale caused by the whole world is up to 44 times of today.
Currently, how fast and effeciently to have searched out user's data of concern from big data in big data era As increasingly important problem.Efficiently quickly creating index is the premise that user scans for, and is usually adopted in the related technology The technical solution of creation index is single thread, and when facing mass data, there are performance bottlenecks, due to wanting to system It asks higher, and the limited system expanding ability, can no longer meet user and fast and effeciently carry out data in mass data The demand of retrieval.
Summary of the invention
The present invention provides a kind of generation method of distributed index and device, at least solve in the related technology can not be right The problem of mass data creation efficiently quickly indexes.
According to an aspect of the invention, there is provided a kind of generation method of distributed index.
The generation method of distributed index according to the present invention includes: to be determined in Hadoop according to the data volume of initial data Mapping (map) operation quantity;It will treated that data distribute makees to multiple specifications (reduce) by each map operation Industry, and generate index database corresponding with each reduce operation, wherein the quantity of reduce operation and each reduce operation Corresponding relationship between one or more map operations is to be pre-configured with completion;To corresponding with each reduce operation Index database merges.
Preferably, generating index database corresponding with each reduce operation includes: the file system for obtaining and currently supporting Type;The generating mode of index database corresponding with each reduce operation is determined according to the type of file system;According to generation side Formula generates index database corresponding with each reduce operation.
Preferably, generating index database corresponding with each reduce operation according to generating mode includes: when file system When type is Hadoop distributed file system (HDFS), index corresponding with each reduce operation is generated in local disk Then the index database generated in local disk is uploaded to HDFS by library;Alternatively, when file system type be except HDFS it When outer remaining supports shared distributed file system (DFS), directly support to generate in shared DFS at remaining with it is each The corresponding index database of reduce operation.
Preferably, merging to index database corresponding with each reduce operation includes: when the type of file system is When HDFS, the index database corresponding with each reduce operation in HDFS is downloaded to local disk;In local disk pair and often The corresponding index database of a reduce operation merges;The index database obtained after merging is uploaded to HDFS, and by local disk In index database corresponding with each reduce operation deleted.
Preferably, merging to index database corresponding with each reduce operation includes: when the type of file system is When remaining supports shared DFS, remaining is supported the index database corresponding with each reduce operation generated in shared DFS into Row merges;Remaining is supported the index database corresponding with each reduce operation generated in shared DFS delete.
According to another aspect of the present invention, a kind of generating means of distributed index are provided.
The generating means of distributed index according to the present invention comprise determining that module, for the data according to initial data Measure the quantity for determining the map operation of the mapping in Hadoop;Generation module, for will be by each map operation treated data It distributes to multiple specification reduce operations, and generates index database corresponding with each reduce operation, wherein reduce operation Corresponding relationship between quantity and each reduce operation and one or more map operation is to be pre-configured with completion;It closes And module, for being merged to index database corresponding with each reduce operation.
Preferably, generation module includes: acquiring unit, for obtaining the type for the file system currently supported;It determines single Member, for determining the generating mode of index database corresponding with each reduce operation according to the type of file system;Generation unit, For generating index database corresponding with each reduce operation according to generating mode.
Preferably, generation unit, for when the type of file system be Hadoop distributed file system HDFS when, this Index database corresponding with each reduce operation is generated in local disk, then uploads the index database generated in local disk To HDFS;Alternatively, generation unit, is the distribution text that remaining support in addition to HDFS is shared for the type when file system When part system DFS, directly support to generate index database corresponding with each reduce operation in shared DFS at remaining.
Preferably, merging module includes: download unit, for when the type of file system be HDFS when, will be in HDFS Index database corresponding with each reduce operation is downloaded to local disk;First combining unit, in local disk pair and often The corresponding index database of a reduce operation merges;First processing units, the index database for obtaining after merging are uploaded to HDFS, and the index database corresponding with each reduce operation in local disk is deleted.
Preferably, merging module includes: the second combining unit, is that remaining supports shared for the type when file system When DFS, the index database corresponding with each reduce operation generated in shared DFS is supported remaining to merge;At second Unit is managed, for supporting the index database corresponding with each reduce operation generated in shared DFS to delete remaining.
Through the embodiment of the present invention, the quantity of the map operation in Hadoop is determined using the data volume according to initial data; Will by each map operation, treated that data distribute to multiple reduce operations, and generate corresponding with each reduce operation Index database, between the quantity of the reduce operation and each reduce operation and one or more map operation it is corresponding pass System is to be pre-configured with completion;Index database corresponding with each reduce operation is merged, i.e., by using Hadoop In map operation and reduce operation initial data is handled, generate corresponding with each reduce operation index database, so Index database corresponding with each reduce operation is merged afterwards, thus solving in the related technology can not create mass data The problem of efficiently quickly indexing is built, and then realizes and mass data efficiently, is rapidly indexed.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the generation method of distributed index according to an embodiment of the present invention;
Fig. 2 is the flow chart of the generation method of distributed index according to the preferred embodiment of the invention;
Fig. 3 is the structural block diagram of the generating means of distributed index according to an embodiment of the present invention;
Fig. 4 is the structural block diagram of the generating means of distributed index according to the preferred embodiment of the invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
Fig. 1 is the flow chart of the generation method of distributed index according to an embodiment of the present invention.As shown in Figure 1, this method May include following processing step:
Step S102: the quantity of the map operation in Hadoop is determined according to the data volume of initial data;
Step S104: will by each map operation, treated that data distribute to multiple reduce operations, and generate with it is every The corresponding index database of a reduce operation, wherein the quantity of reduce operation and each reduce operation and one or more Corresponding relationship between map operation is to be pre-configured with completion;
Step S106: index database corresponding with each reduce operation is merged.
In the related technology, mass data can not be created and efficiently, is quickly indexed.Using method as shown in Figure 1, pass through Using in Hadoop map operation and reduce operation initial data is handled, generate it is corresponding with each reduce operation Index database, then index database corresponding with each reduce operation is merged, thus solves and in the related technology can not The problem of mass data creation is efficiently quickly indexed, and then realize and mass data efficiently, is rapidly indexed.
Preferably, in step S104, generating index database corresponding with each reduce operation may include following operation:
Step S1: the type for the file system currently supported is obtained;
Step S2: the generating mode of index database corresponding with each reduce operation is determined according to the type of file system;
Step S3: index database corresponding with each reduce operation is generated according to generating mode.
In a preferred embodiment, firstly, it is necessary to determine the size of the data volume of initial data to be obtained, and it is divided into M (M is positive integer) part, wherein every part of data respectively correspond a map operation.Certainly, data volume handled by each map operation It can be with dynamic configuration.Map data processing plug-in unit is set as a result,.In addition, in being generated after each map operation processing Between key-value pair collection credit union be periodically written local disk, it is positive integer that local disk can be divided into N(N again) it is a, N be user from Definition setting, each subregion respectively corresponds a reduce operation.By configuring the maximum number of reduce operation, to improve The creation efficiency of distributed index, and inserted according to the setting reduce data processing of the quantity of the reduce operation of user configuration Part.In the preferred embodiment, creation index can support Hadoop distributed file system (HDFS) and other support Shared distributed file system (DFS).It therefore, can be poor according to the type for the file system supported in creation Index process Then the generating mode of different determination index database corresponding with each reduce operation generates and each reduce according to generating mode The corresponding index database of operation.
Preferably, in step s3, generating index database corresponding with each reduce operation according to generating mode can wrap Include one of following steps:
Step S31: raw in local disk when the type of file system is Hadoop distributed file system (HDFS) At index database corresponding with each reduce operation, the index database generated in local disk is then uploaded to HDFS;
Step S32: when the type of file system is remaining distributed file system for supporting to share in addition to HDFS (DFS) it when, directly supports to generate index database corresponding with each reduce operation in shared DFS at remaining.
In a preferred embodiment, if the type for the file system currently supported is HDFS, each reduce operation Interim index database is generated in local file system (i.e. local disk);Then, the scale removal process last in reduce operation In, the interim index database generated in local file system can be uploaded in HDFS file system.If currently supported The type of file system is that remaining supports shared DFS, then interim index database can be directly generated in DFS file system.
Preferably, in step s 106, index database corresponding with each reduce operation is merged may include with Lower operation:
Step S4: when the type of file system is HDFS, by the index corresponding with each reduce operation in HDFS Library is downloaded to local disk;
Step S5: it is merged in local disk pair index database corresponding with each reduce operation;
Step S6: the index database obtained after merging is uploaded to HDFS, and will be made in local disk with each reduce The corresponding index database of industry is deleted.
In a preferred embodiment, if the type for the file system currently supported is HDFS, first by Hadoop's It indexes host node (master) and all interim index databases is downloaded to local file system from HDFS file system;Secondly, The all interim index database in local file system is merged on index host node, generates complete index database;Again, exist Complete index database is uploaded in HDFS file system on index host node;Then, by local file on index host node Each interim index database in system is deleted;Finally, the index of Hadoop is from node (slave) from HDFS file system Complete index database is downloaded in local file system, is used to retrieve.
Preferably, in step s 106, index database corresponding with each reduce operation is merged may include with Lower step:
Step S7: when the type of file system is that remaining supports shared DFS, remaining is supported raw in shared DFS At index database corresponding with each reduce operation merge;
Step S8: remaining is supported the index database corresponding with each reduce operation generated in shared DFS delete It removes.
In a preferred embodiment, if the type for the file system currently supported is that remaining supports shared DFS, first The interim index database in DFS file system is merged into complete index database by the index host node of Hadoop, is made to retrieve With;Each interim index database in DFS file system is deleted on index host node again.
Above-mentioned preferred implementation process is further described below in conjunction with preferred embodiment shown in Fig. 2.
Fig. 2 is the flow chart of the generation method of distributed index according to the preferred embodiment of the invention.As shown in Fig. 2, should Process may include following processing stage:
First stage: data acquisition phase, i.e. the map sessions of Hadoop, data acquisition phase are setting indexes The preposition preparation stage can provide data for creation index and support.It is distributed used by the map sessions of Hadoop Implementation, can concurrently handle data, wherein the quantity of map operation needs are dynamically determined by the data volume acquired. Data are handled using the acquisition text file or database file of the map operation of Hadoop, generate creation index institute The content of each field (i.e. key-value pair (key, value) is gathered) needed, thus greatly improves data processing performance.And In acquisition due to supporting plug-in unit processing, different processing modes can be customized according to data volume.
Second stage: creation index stage, i.e. the reduce sessions of Hadoop create distributed index library.Pass through The number of reduce operation is set to determine the greatest measure reduceNum of reduce job parallelism processing.Rank is acquired in data The data of Duan Shengcheng distribute specific data by HashCode () %reduceNum to each reduce operation as rope Draw, each reduce operation generates the interim index library file of itself respectively.
It should be noted that creation index can support Hadoop distributed file system (HDFS) and other support Shared distributed file system (DFS).
Phase III: index merging phase generates each according to each reduce operation that the creation index stage obtains Interim index database calls index to merge and each interim index database is merged into a complete index database by index host node.It is holding When line index merges, each interim index database can be read one by one, interim index database is incorporated into individual master index library, finally Each interim index database is deleted, and provides retrieval service by master index library.
Fig. 3 is the structural block diagram of the generating means of distributed index according to an embodiment of the present invention.As shown in figure 3, the dress Setting may include: determining module 10, and the number of the mapping map operation in Hadoop is determined for the data volume according to initial data Amount;Generation module 20, treated for will pass through each map operation, and data are distributed to multiple specification reduce operations, and raw At index database corresponding with each reduce operation, wherein the quantity of reduce operation and each reduce operation and one Or the corresponding relationship between multiple map operations is to be pre-configured with completion;Merging module 30, for making to each reduce The corresponding index database of industry merges.
Using device as shown in Figure 3, quickly index can not be created efficiently to mass data in the related technology by solving The problem of, and then realize and mass data efficiently, is rapidly indexed.
Preferably, as shown in figure 4, generation module 20 may include: acquiring unit 200, for obtaining the text currently supported The type of part system;Determination unit 202, for determining index corresponding with each reduce operation according to the type of file system The generating mode in library;Generation unit 204, for generating index database corresponding with each reduce operation according to generating mode.
Preferably, as shown in figure 4, generation unit 204, is Hadoop distributed field system for the type when file system When system HDFS, index database corresponding with each reduce operation is generated in local disk, then will be generated in local disk Index database be uploaded to HDFS;Alternatively, generation unit 204, is remaining in addition to HDFS for the type when file system When supporting shared distributed file system DFS, directly support to generate and each reduce operation pair in shared DFS at remaining The index database answered.
Preferably, as shown in figure 4, merging module 30 may include: download unit 300, for working as the type of file system When for HDFS, the index database corresponding with each reduce operation in HDFS is downloaded to local disk;First combining unit 302, for being merged in local disk pair index database corresponding with each reduce operation;First processing units 304, are used for The index database obtained after merging is uploaded to HDFS, and by the index database corresponding with each reduce operation in local disk into Row is deleted.
Preferably, as shown in figure 4, merging module 30 may include: the second combining unit 306, for when file system When type is that remaining supports shared DFS, the rope corresponding with each reduce operation generated in shared DFS is supported remaining Draw library to merge;The second processing unit 308 is generating with each reduce operation in shared DFS for supporting remaining Corresponding index database is deleted.
It can be seen from the above description that above example implements following technical effect (it should be noted that these Effect is the effect that certain preferred embodiments can achieve): using technical solution provided by the embodiment of the present invention, can pass through Initial data is handled using the map-reduce programming model in Hadoop, is generated corresponding with each reduce operation Then index database merges index database corresponding with each reduce operation, a complete index database is formed, to examine Rope uses, and thus solves the problems, such as that mass data can not be created efficiently quickly index in the related technology, and then realize Mass data efficiently, is rapidly indexed.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of generation method of distributed index characterized by comprising
The quantity of the mapping map operation in Hadoop is determined according to the data volume of initial data;
Will by each map operation, treated that data distribute to multiple specification reduce operations, and generate and each reduce The corresponding index database of operation, wherein the quantity of the reduce operation and each reduce operation and one or more Corresponding relationship between map operation is to be pre-configured with completion;
Index database corresponding with each reduce operation is merged;
Wherein, the class that index database corresponding with each reduce operation includes: the file system that acquisition is currently supported is generated Type;According to the generating mode of the determining index database corresponding with each reduce operation of the type of the file system;According to The generating mode generates index database corresponding with each reduce operation.
2. the method according to claim 1, wherein being generated and each reduce according to the generating mode The corresponding index database of operation includes:
When the type of the file system is Hadoop distributed file system HDFS, generated in local disk and described every The corresponding index database of a reduce operation, is then uploaded to the HDFS for the index database generated in the local disk; Alternatively,
When the type of the file system is the distributed file system DFS that remaining support in addition to the HDFS is shared, Directly it is described remaining support to generate index database corresponding with each reduce operation in shared DFS.
3. according to the method described in claim 2, it is characterized in that, to index database corresponding with each reduce operation into Row merges
It, will be corresponding with each reduce operation in the HDFS when the type of the file system is the HDFS Index database is downloaded to the local disk;
It is merged in the local disk pair index database corresponding with each reduce operation;
The index database obtained after merging is uploaded to the HDFS, and will be made in the local disk with each reduce The corresponding index database of industry is deleted.
4. according to the method described in claim 2, it is characterized in that, to index database corresponding with each reduce operation into Row merges
It is raw in the DFS shared to remaining support when the type of the file system is the shared DFS of remaining support At index database corresponding with each reduce operation merge;
Remaining described index database corresponding with each reduce operation for supporting to generate in shared DFS is deleted.
5. a kind of generating means of distributed index characterized by comprising
Determining module determines the quantity of the mapping map operation in Hadoop for the data volume according to initial data;
Generation module, treated for will pass through each map operation, and data are distributed to multiple specification reduce operations, and are generated Index database corresponding with each reduce operation, wherein the quantity of the reduce operation and each reduce operation Corresponding relationship between one or more map operations is to be pre-configured with completion;
Merging module, for being merged to index database corresponding with each reduce operation;
Wherein, the generation module includes: acquiring unit, for obtaining the type for the file system currently supported;Determination unit, For the generating mode according to the determining index database corresponding with each reduce operation of the type of the file system;It generates Unit, for generating index database corresponding with each reduce operation according to the generating mode.
6. device according to claim 5, which is characterized in that the generation unit, for working as the class of the file system When type is Hadoop distributed file system HDFS, index corresponding with each reduce operation is generated in local disk Then the index database generated in the local disk is uploaded to the HDFS by library;Alternatively, the generation unit, is used for When the type of the file system is the distributed file system DFS that remaining support in addition to the HDFS is shared, directly It is described remaining support to generate index database corresponding with each reduce operation in shared DFS.
7. device according to claim 6, which is characterized in that the merging module includes:
Download unit, for when the type of the file system is the HDFS, by the HDFS with it is described each The corresponding index database of reduce operation is downloaded to the local disk;
First combining unit, for being closed in the local disk pair index database corresponding with each reduce operation And;
First processing units, the index database for obtaining after merging are uploaded to the HDFS, and will be in the local disk Index database corresponding with each reduce operation is deleted.
8. device according to claim 6, which is characterized in that the merging module includes:
Second combining unit, for when the type of the file system be it is described remaining support shared DFS when, to it is described remaining The index database corresponding with each reduce operation for supporting to generate in shared DFS merges;
The second processing unit, it is corresponding with each reduce operation for will be generated in the shared DFS of remaining support Index database deleted.
CN201310695615.6A 2013-12-17 2013-12-17 The generation method and device of distributed index Active CN104714983B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310695615.6A CN104714983B (en) 2013-12-17 2013-12-17 The generation method and device of distributed index
PCT/CN2014/078696 WO2014180411A1 (en) 2013-12-17 2014-05-28 Distributed index generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310695615.6A CN104714983B (en) 2013-12-17 2013-12-17 The generation method and device of distributed index

Publications (2)

Publication Number Publication Date
CN104714983A CN104714983A (en) 2015-06-17
CN104714983B true CN104714983B (en) 2019-02-19

Family

ID=51866791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310695615.6A Active CN104714983B (en) 2013-12-17 2013-12-17 The generation method and device of distributed index

Country Status (2)

Country Link
CN (1) CN104714983B (en)
WO (1) WO2014180411A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354251B (en) * 2015-10-19 2018-10-30 国家电网公司 Electric power cloud data management indexing means based on Hadoop in electric system
CN105430078B (en) * 2015-11-17 2019-03-15 浪潮(北京)电子信息产业有限公司 A kind of distributed storage method of mass data
CN105610899B (en) * 2015-12-10 2019-09-24 浪潮(北京)电子信息产业有限公司 A kind of parallel method for uploading of text file and device
US11216516B2 (en) 2018-06-08 2022-01-04 At&T Intellectual Property I, L.P. Method and system for scalable search using microservice and cloud based search with records indexes

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture
CN102436491A (en) * 2011-11-08 2012-05-02 张三明 System and method used for searching huge amount of pictures and based on BigBase
CN102479217A (en) * 2010-11-23 2012-05-30 腾讯科技(深圳)有限公司 Method and device for realizing computation balance in distributed data warehouse

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100162230A1 (en) * 2008-12-24 2010-06-24 Yahoo! Inc. Distributed computing system for large-scale data handling
CN102467570B (en) * 2010-11-17 2014-03-12 日电(中国)有限公司 Connection query system and method for distributed data warehouse
US9361323B2 (en) * 2011-10-04 2016-06-07 International Business Machines Corporation Declarative specification of data integration workflows for execution on parallel processing platforms
US20130151535A1 (en) * 2011-12-09 2013-06-13 Canon Kabushiki Kaisha Distributed indexing of data
CN103246549B (en) * 2012-02-07 2016-12-14 阿里巴巴集团控股有限公司 A kind of method and system of data conversion storage
CN103440244A (en) * 2013-07-12 2013-12-11 广东电子工业研究院有限公司 Large-data storage and optimization method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479217A (en) * 2010-11-23 2012-05-30 腾讯科技(深圳)有限公司 Method and device for realizing computation balance in distributed data warehouse
CN102436491A (en) * 2011-11-08 2012-05-02 张三明 System and method used for searching huge amount of pictures and based on BigBase
CN102426609A (en) * 2011-12-28 2012-04-25 厦门市美亚柏科信息股份有限公司 Index generation method and index generation device based on MapReduce programming architecture

Also Published As

Publication number Publication date
WO2014180411A1 (en) 2014-11-13
CN104714983A (en) 2015-06-17

Similar Documents

Publication Publication Date Title
US11216302B2 (en) Modifying task dependencies at worker nodes using precompiled libraries
CN102521416B (en) Data correlation query method and data correlation query device
Yan et al. Quegel: A general-purpose query-centric framework for querying big graphs
Perez et al. Ringo: Interactive graph analytics on big-memory machines
CN104881466B (en) The processing of data fragmentation and the delet method of garbage files and device
CN105900093B (en) A kind of update method of the tables of data of KeyValue databases and table data update apparatus
CN106030573A (en) Implementation of semi-structured data as a first-class database element
CN106970929B (en) Data import method and device
CN107748752B (en) Data processing method and device
CN102982075A (en) Heterogeneous data source access supporting system and method thereof
CN106611037A (en) Method and device for distributed diagram calculation
JP2017507378A (en) Incremental and concatenated redistribution to extend online shared nothing database
CN104714983B (en) The generation method and device of distributed index
CN103246549B (en) A kind of method and system of data conversion storage
CN104111936A (en) Method and system for querying data
JP2014078085A (en) Execution control program, execution control method and information processor
CN108037967A (en) A kind of menu loading method and electronic equipment based on more parent-child structures
US20180095719A1 (en) Sorted linked list with a midpoint binary tree
Tanase et al. A highly efficient runtime and graph library for large scale graph analytics
Hashem et al. An Integrative Modeling of BigData Processing.
Peng et al. An analysis platform of road traffic management system log data based on distributed storage and parallel computing techniques
Baig et al. Big Data Tools: Advantages and Disadvantages.
Haque et al. Distributed RDF triple store using hbase and hive
CN112817930A (en) Data migration method and device
Ma et al. Efficient attribute-based data access in astronomy analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant