CN102760165A - Full text retrieval method using bitmap index and device - Google Patents

Full text retrieval method using bitmap index and device Download PDF

Info

Publication number
CN102760165A
CN102760165A CN2012101938744A CN201210193874A CN102760165A CN 102760165 A CN102760165 A CN 102760165A CN 2012101938744 A CN2012101938744 A CN 2012101938744A CN 201210193874 A CN201210193874 A CN 201210193874A CN 102760165 A CN102760165 A CN 102760165A
Authority
CN
China
Prior art keywords
bitmap
node
bitmap index
index
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101938744A
Other languages
Chinese (zh)
Other versions
CN102760165B (en
Inventor
赵伟
郑程光
孙伟丰
罗正海
李泉
李�浩
李书淦
程仁波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Founder Digital Publishing Technology (Shanghai) Co.,Ltd.
Founder Information Industry Holdings Co Ltd
Peking University Founder Group Co Ltd
Original Assignee
FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO LTD filed Critical FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO LTD
Priority to CN201210193874.4A priority Critical patent/CN102760165B/en
Publication of CN102760165A publication Critical patent/CN102760165A/en
Application granted granted Critical
Publication of CN102760165B publication Critical patent/CN102760165B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a full text retrieval method using bitmap index and a device. The technical scheme of the invention enables an XML (Extensive Makeup Language) data access module in a conventional XML database management system to rapidly test the existence of words; as full text retrieval aims to find node data rows containing certain words appointed by a user, the existence test is one of the most important core functions in the full text retrieval; compared with the traditional B+ Tree index, the full text retrieval method using the bitmap index can achieve more efficient full text retrieval; and meanwhile, the bitmap index is adopted, so that the performance of the full text retrieval can also be improved remarkably to a great extent.

Description

A kind of text searching method and device that uses bitmap index
Technical field
The present invention relates to field of computer technology, particularly a kind of text searching method and device that uses bitmap index.
Background technology
XML (full name Extensible Markup Language); It is a kind of a kind of SGML that designs for internet specially; And because its ability that has the various information of effectively expressing, data and make various applicating cooperation work; Become the de facto standard of data issuing and exchanges data already, therefore, XML had been developed in recent years and was using widely.The emphasis of XML does not lie in the form of data itself, and is management data information, and therefore, XML makes the unification of disparate databases pattern become possibility, for the integration problem of heterogeneous database provides approach.
XML data base management system (DBMS) (XMLDBMS) is fast-developing in recent years a kind of novel data base management system (DBMS) (DBMS), and it is used for storage and data retrieved is an XML document, and supports the renewal operation of XML document.Along with the XML standard is adopted as data exchange standard by increasing industry; The management of XML data (comprising storage, retrieval, renewal etc.) demand is also keeping increasing continuously and healthily, and particularly the XML data base management system (DBMS) is fit to handle text class data and XML document data more than relational database management system.Query engine is the core subsystem in the data base management system (DBMS) (data base management system (DBMS)); Because XML is typical semi-structured data; Search request to the XML data is different from traditional database; When inquiry, not only to inquire about, and will inquire about the structure of XML document and the relation between the data to the numerical value in the database.
The entity of storing X ML document is called container in the XML data base management system (DBMS), the data content of any a plurality of XML documents of storage in container, and this container is supported by several tables of data; Store the data and the structural information of the various aspects of these XML documents respectively; Comprise node data, relationships between nodes, path data; Various index, various statistic etc.The unit of data table stores is a data line, contains several data lines in the tables of data, and can find specific data line fast through index.The position of each row in the tables of data uses a TID to identify, and a TID identifies a data line uniquely and contains the address of this data line in tables of data, and we can find data line with TID like this.
The XML document content is used as node data and is stored in the node table; In 7 kinds of nodes of XDM; Attribute node, name space node, text node; Note node and processing instruction node all leave in their father's node (must be node element), so storage element node and document node in the node table.The metadata information of an XML document of document node storage, the content of document then is stored in all node elements of this document.
Further investigation along with the XML correlation technique; The XML inquiry has possessed solid technical foundation on this basis; W3CWorldWideWebConsortium has proposed XML query language specification working draft-XQuery language Dec calendar year 2001; Up to now, the XQuery language is always in constantly developing.The retrieval of XML data and more newspeak be the XQuery and the XQuery Update of the standard formulated by W3C.W3C has formulated XQuery Fulltext standard simultaneously, as the full-text search language of the standard of XML data base management system (DBMS).Full-text search in order to ensure the XML data base management system (DBMS) can be carried out efficiently, need one efficiently text index support full-text search.
Summary of the invention
For addressing the above problem, a kind of text searching method that uses bitmap index that technical scheme of the present invention provides is applicable to the XML data base management system (DBMS), comprises the steps:
Analyzing XML file obtains all elements node of said XML document;
The all elements node of said XML document is stored in the node table;
Take out all text child nodes of said node element, and all text child nodes are split into several words, constitute a set;
Stop words is carried out in said set filtered, obtain effective set of letters;
Make up the bitmap index table;
Utilize said bitmap index table to carry out full-text search.
Alternatively, said structure bitmap index table is specially: make up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table, form said bitmap index table.
Alternatively, utilize following formula to make up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table:
block=i/M;
offset=i%M;
Wherein, TID is the address of the data line in the node table, expression become (block, offset), M is a storable number of data lines purpose maximal value in the data page, i representes the position in the bitmap of each word.
Alternatively; Further comprise; Be stored in the process in the node table at said all elements node, also further comprise, be each data line index building key assignments of said bitmap index table said XML document; Be specially: said bitmap index table has a call back function, and said call back function is responsible for each data line index building key assignments of said bitmap index table;
Said call back function is:
IndexKeyRange?IndexBuilder(HeapTuple?htup,Relation?heap,Relation?index);
Wherein, said htup is the data line of said node table, and heap and index are the handle objects of said node table and said bitmap index table, and rreturn value is the object of IndexKeyRange type.
Alternatively, further comprise: the bitmap to each word in effective set of letters of said structure forms is carried out the RLE compression.
Alternatively, the described method of utilizing said bitmap index to carry out full-text search is specially:
When xquery fulltext inquires about, wherein querying condition character string is split into plurality of words;
Use the inactive vocabulary of acquiescence to filter above-mentioned plurality of words, obtain effective looking up words set;
The word that uses the bitmap index table to inquire about successively in effective set of letters obtains several bitmaps;
Described some bitmaps are carried out the target logic computing that meets querying condition obtain final bitmap, also promptly finally met the capable TID set of node data of full-text search condition.
Alternatively, said analyzing XML file is specially with all elements node that obtains said XML document: the XML document resolver of said XML document being sent into the SAX pattern is resolved, to obtain all elements node of said XML document.
The present invention also provides a kind of full-text search device that uses bitmap index, comprising:
The document resolution unit is used for analyzing XML file, obtains all elements node of said XML document;
The node storage unit is used for all elements node of said XML document is stored to a node table;
The text node split cells, all text child nodes that are used to take out said node element, and said text child node split into several words, constitute a set;
Filter element is used for that stop words is carried out in said set and filters, and obtains effective set of letters;
The concordance list construction unit is used to make up the bitmap index table;
Retrieval unit is used to utilize said bitmap index table to carry out full-text search.
Alternatively; Described concordance list construction unit is specially: make up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table, form said bitmap index table.
Alternatively, the following formula of described index construct unit by using makes up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table:
block=i/M;
offset=i%M;
Wherein, TID is the address of the data line in the node table, expression become (block, offset), M is a storable number of data lines purpose maximal value in the data page, i representes the position in the bitmap of each word.
Alternatively, said bitmap index table has a call back function, and said call back function is responsible for each data line index building key assignments of said bitmap index table;
Said call back function is:
IndexKeyRange?IndexBuilder(HeapTuple?htup,Relation?heap,Relation?index);
Wherein, said htup is the data line of said node table, and heap and index are the handle objects of said node table and said bitmap index table, and rreturn value is the object of IndexKeyRange type.
Alternatively, further comprise a compression unit, described compression unit is used for the bitmap of each word formation of effective set of letters of said structure is carried out the RLE compression.
Alternatively, described retrieval unit further comprises:
The inquiry string split cells is used for when xquery fulltext inquires about, and wherein querying condition character string is split into plurality of words;
The stop words filter element is used to use the inactive vocabulary of acquiescence to filter above-mentioned plurality of words, obtains effective looking up words set;
The bitmap query unit, the word that is used for using the bitmap index table to inquire about effective set of letters successively obtains several bitmaps;
The target bitmap acquiring unit is used for that described some bitmaps are carried out the target logic computing that meets querying condition and obtains final bitmap, is also promptly finally met the capable TID set of node data of full-text search condition.
Alternatively, described document resolution unit is used for said XML document is sent into the XML document resolver of SAX pattern resolves, to obtain all elements node of said XML document.
Compared with prior art, technique scheme has following advantage:
Technical scheme of the present invention makes the XML Data access module in the existing XML data base management system (DBMS) can carry out the existence test of word apace; Because the target of full-text search is exactly to find the node data of some word that contains user's appointment capable; So the test of this existence is most crucial and one of most important function in the full-text search; Compare with traditional B+ tree index, the text searching method of use bitmap index of the present invention can be realized full-text index more efficiently, simultaneously; Owing to used bitmap index, also significantly improved the performance of full-text search to a great extent.
Description of drawings
Fig. 1 is the process flow diagram of text searching method of the use bitmap index of embodiment of the present invention;
Fig. 2 is the process flow diagram of the step S6 in the text searching method of use bitmap index of embodiment of the present invention;
Fig. 3 is the composition framework synoptic diagram of full-text search device of the use bitmap index of embodiment of the present invention;
Fig. 4 is the composition framework synoptic diagram of retrieval unit of full-text search device of the use bitmap index of embodiment of the present invention.
Embodiment
For make above-mentioned purpose of the present invention, feature and advantage can be more obviously understandable, does detailed explanation below in conjunction with the accompanying drawing specific embodiments of the invention.Set forth detail in the following description so that make much of the present invention.But the present invention can be different from alternate manner described here and implements with multiple, and those skilled in the art can do similar popularization under the situation of intension of the present invention.Therefore the present invention does not receive the restriction of following disclosed embodiment.
The entity that those skilled in the art will appreciate that storing X ML document in the XML data base management system (DBMS) is called container, the data content of any a plurality of XML documents of storage in container; And this container is supported by several tables of data, stores the data and the structural information of the various aspects of these XML documents respectively, comprises node data; Relationships between nodes; Path data, various index, various statistic etc.The unit of data table stores is a data line, contains several data lines in the tables of data, and can find specific data line fast through index.The position of each row in the tables of data uses a TID to identify, and a TID identifies a data line uniquely and contains the address of this data line in tables of data, and we can find data line with TID like this.
Bitmap is exactly a bit sequence in fact, and these modes with integer are organized, and is placed on usually such as one 128 bitmaps in the array of 4 32 integers compositions, perhaps is placed in the array of 2 64 integers compositions.Each of bitmap is 0 or is 1 can represent the true and false of a fact.Can contain any a plurality of bitmap in the bitmap index.This shows that bitmap index can be expressed the existence fact of arbitrary number well.
For solving the problems of the prior art, inventor of the present invention has proposed to use the text searching method of bitmap index through research, applicable to the XML data base management system (DBMS).Consult Fig. 1, Fig. 1 is the process flow diagram of text searching method of the use bitmap index of embodiment of the present invention.The text searching method of the use bitmap index of embodiment of the present invention comprises:
At first, get into step S1: analyzing XML file, and obtain all elements node of said XML document;
Wherein, Preferably; Analyzing XML file in the embodiment of the present invention; The all elements node that obtains said XML document is specially: XML document is sent in the XML document resolver of SAX pattern, the XML document resolver can let the query engine in the data base management system (DBMS) obtain all elements node data of XML document through the form of event notice.
Get into step S2: all elements node of said XML document is stored in the node table;
Wherein, Mechanism in execution of this step and the XML data base management system (DBMS) of the prior art in like manner; The node element of query engine in the XML data base management system (DBMS) under text node resolved the SAX event notice that can obtain the XML resolver finish time; The method of query engine treatment S AX incident is to call the data line storage means of storage engines, said node element data storage in node table, thereby accomplish the execution of this step.
In addition; Of the present invention one preferred embodiment in, be stored in the process in the node table at said all elements node said XML document, also further comprise; Data line is being stored in the process of said node table; Be each data line index building key assignments of said bitmap index table, be specially: said bitmap index table has a call back function, and said call back function is responsible for each data line index building key assignments of said bitmap index table;
Said call back function is:
IndexKeyRange?IndexBuilder(HeapTuple?htup,Relation?heap,Relation?index);
Wherein, Said htup is the data line of said node table; Heap and index are the handle objects of said node table and said bitmap index table, and rreturn value is the object of IndexKeyRange type, and this object contains start offset and the length of index key in the htup data line.We just can take out the index key value part among the htup as the index key assignments to have obtained IndexKeyRange.Like this; A data line is being stored in the process of said node table; Through calling the IndexBuilder function of all bitmap index tables successively, be followed successively by each bitmap index table index building key assignments and index key assignments (TID) as capable storing in the bitmap index table of index data.
Get into step S3: take out all text child nodes of said node element and all text child nodes are split into several words, constitute a set;
Get into step S4: stop words is carried out in said set filtered, obtain effective set of letters;
Wherein, in last step, contain the stop words that when inquiry is perhaps retrieved, does not have meaning in the word of the text child node that splits out; Therefore, in order to improve effectiveness of retrieval, need to get into this step; Stop words is filtered, thereby obtain effective set of letters.
Get into step S5: make up the bitmap index table;
Wherein, preferably, in embodiments of the present invention, make up the bitmap of each word in said effective set of letters, form a bitmap index table according to the mapping relations between the data line of each word in said effective set of letters and said node table;
Wherein, utilize following formula to make up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table:
block=i/M;
offset=i%M;
Wherein, TID is the address of the data line in the node table, expression become (block, offset), M is a storable number of data lines purpose maximal value in the data page, i representes the position in the bitmap of each word.
As far as each word in the efficient set, because these words are present in the corresponding data line of node table, therefore, the i position of each word in the efficient set can be set to 1 (i=TID.block*M+TID.offset).Through calculating above-mentioned two variate-value block and offset, just accomplished the target that i position in the bitmap of each word is mapped to a data line of a node table.
It is pointed out that when a node data content is updated, for performance considers not remove those original capable bitmap indexs of this node data that points to.Like this, when we obtain a node, also need further to do participle therein, and search target word.But owing to avoided searching target word in a large amount of node in the node table of a container, it is capable to obtain all node datas that contain target word with lower cost, thereby when having quickened the efficient of full-text search, has reduced running cost.
In addition; Since in the bitmap index table, all corresponding bitmap of each word of all documents of document container storage, and perhaps 0 node data is capable corresponding to one for each of the bitmap of each word; Wherein, 1 the expression this word be present in node data capable in; 0 this word of expression is not present in this data line.Therefore, may occur a large amount of continuous 0 and 1 in the bitmap that adopts above-mentioned mode to store.In order further to improve the performance of full-text search, carry out RLE (Run Length Encoding) compression for the bitmap of each word, like this, the bitmap that is stored in the bitmap index table is exactly through the bitmap after the RLE compression.
Get into step S6: utilize said bitmap index table to carry out full-text search.
Referring to Fig. 2, Fig. 2 shows the particular flow sheet that utilizes the bitmap index table to carry out full-text search.The idiographic flow that utilizes the bitmap index table to carry out full-text search is specially:
When xquery fulltext inquires about, wherein querying condition character string is split into plurality of words;
Use the inactive vocabulary of acquiescence to filter above-mentioned plurality of words, obtain effective looking up words set;
The word that uses the bitmap index table to inquire about successively in effective set of letters obtains several bitmaps;
Described some bitmaps are carried out the target logic computing that meets querying condition obtain final bitmap, also promptly finally met the capable TID set of node data of full-text search condition.
When utilizing above-mentioned formation bitmap index table to carry out xquery fulltext inquiry, the querying condition character string is done participle, obtain each word in the querying condition, then, after the inactive vocabulary filtration of using acquiescence, obtain effective looking up words set.Use the word in effective query set the bitmap index table to inquire about successively and obtain working hard a bitmap.According to the difference of querying condition, effectively the logical relation between the word in the looking up words set that is to say the operation relation between the bitmap of these words.These bitmap actuating logic computings are obtained final bitmap, just finally meet the capable TID set of node data of full-text search condition.Because; The position of each row in the tables of data uses a TID to identify; A TID identifies a data line uniquely and contains the address of this data line in tables of data, therefore can find corresponding data line with TID, thereby accomplishes the target of full-text search.
For example, effective looking up words set={ " English ", " teacher " }, " English " wherein and " teacher " and the logical relation between two words be with, that is to say that inquiry contains the XML node element of " English " and " teacher " two words simultaneously.So query engine just obtains from the bitmap index table " English " and " teacher " bitmap of two words; Then " English " and " teacher " bitmap of two words does AND-operation; Obtaining final bitmap, is that 1 the corresponding data line in position is exactly to contain simultaneously in this bitmap " English " and " teacher " node data of two words is capable.
Consult Fig. 3, Fig. 3 shows the full-text search device of the use bitmap index of embodiment of the present invention, comprising:
Document resolution unit 110 is used for analyzing XML file, obtains all elements node of said XML document; Wherein, described document resolution unit 110 is used for said XML document is sent into the XML document resolver of SAX pattern resolves, to obtain all elements node of said XML document.
Node storage unit 120 is used for all elements node of said XML document is stored to a node table;
Text node split cells 130, all text child nodes that are used to take out said node element, and said text child node split into several words, constitute a set;
Filter element 140 is used for that stop words is carried out in said set and filters, and obtains effective set of letters;
Concordance list construction unit 150 is used to make up the bitmap index table; Described concordance list construction unit tool 150 bodies are to utilize following formula to make up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table:
block=i/M;
offset=i%M;
Wherein, TID is the address of the data line in the node table, expression become (block, offset), M is a storable number of data lines purpose maximal value in the data page, i representes the position in the bitmap of each word.
Said bitmap index table wherein has a call back function, and said call back function is responsible for each data line index building key assignments of said bitmap index table;
Said call back function is:
IndexKeyRange?IndexBuilder(HeapTuple?htup,Relation?heap,Relation?index);
Wherein, said htup is the data line of said node table, and heap and index are the handle objects of said node table and said bitmap index table, and rreturn value is the object of IndexKeyRange type.
In addition; Since in the bitmap index table, all corresponding bitmap of each word of all documents of document container storage, and perhaps 0 node data is capable corresponding to one for each of the bitmap of each word; Wherein, 1 the expression this word be present in node data capable in; 0 this word of expression is not present in this data line.Therefore, may occur a large amount of continuous 0 and 1 in the bitmap that adopts above-mentioned mode to store.In order further to improve the performance of full-text search; Index construct unit 150 also further comprises a compression unit; Be used for the bitmap of each word is carried out RLE (Run Length Encoding) compression, like this, the bitmap that is stored in the bitmap index table is exactly through the bitmap after the RLE compression.
Retrieval unit 160 is used to utilize said bitmap index table to carry out full-text search.Wherein, described retrieval unit further comprises:
Consult Fig. 4, the retrieval unit 160 that Fig. 4 shows among the present invention further comprises:
Inquiry string split cells 160a is used for when xquery fulltext inquires about, and wherein querying condition character string is split into plurality of words;
Stop words filter element 160b is used to use the inactive vocabulary of acquiescence to filter above-mentioned plurality of words, obtains effective looking up words set;
Bitmap query unit 160c, the word that is used for using the bitmap index table to inquire about effective set of letters successively obtains several bitmaps;
Target bitmap acquiring unit 160d is used for that described some bitmaps are carried out the target logic computing that meets querying condition and obtains final bitmap, is also promptly finally met the capable TID set of node data of full-text search condition.
In sum, technical scheme of the present invention has following advantage:
Technical scheme of the present invention makes the XML Data access module in the existing XML data base management system (DBMS) can carry out the existence test of word apace; Because the target of full-text search is exactly to find the node data of some word that contains user's appointment capable; So the test of this existence is most crucial and one of most important function in the full-text search; Compare with traditional B+ tree index, the text searching method of use bitmap index of the present invention can be realized full-text index more efficiently, simultaneously; Owing to used bitmap index, also significantly improved the performance of full-text search to a great extent.
Should be understood that described method and system can be with various forms of hardware, software, firmware, dedicated processor or their combination realization here.Especially, a part at least of the present invention comprises that the application program of programmed instruction preferably realizes.These programmed instruction positively are included in one or more program storage device and (are included but not limited to hard disk; Magnetic floppy disc, RAM, ROM; CD; ROM etc.) lining, and can be by any equipment or machine that comprises appropriate configuration, for example a kind of universal digital computer with processor, internal memory and input/output interface is carried out.It should also be understood that because the building block of some systems of describing in the accompanying drawing and treatment step are preferably realized with software so the connection between the system module (the perhaps logic flow of method step) maybe be different, this depends on programming mode of the present invention.According to guidance given here, those of ordinary skill in the related art can design these and similar embodiment of the present invention.
More than disclose many aspects of the present invention and embodiment, it will be understood by those skilled in the art that others of the present invention and embodiment.Disclosed many aspects and embodiment just are used to illustrate among the present invention, are not to be to qualification of the present invention, and real protection domain of the present invention and spirit should be as the criterion with claims.

Claims (14)

1. a text searching method that uses bitmap index is applicable to the XML data base management system (DBMS), it is characterized in that, comprises the steps:
Analyzing XML file obtains all elements node of said XML document;
The all elements node of said XML document is stored in the node table;
Take out all text child nodes of said node element, and said text child node is split into several words, constitute a set;
Stop words is carried out in said set filtered, obtain effective set of letters;
Make up the bitmap index table;
Utilize said bitmap index table to carry out full-text search.
2. the text searching method of use bitmap index as claimed in claim 1; It is characterized in that; Said structure bitmap index table is specially: make up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table, form said bitmap index table.
3. the text searching method of use bitmap index as claimed in claim 2; It is characterized in that, utilize following formula to make up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table:
block=i/M;
offset=i%M;
Wherein, TID is the address of the data line in the node table, expression become (block, offset), M is a storable number of data lines purpose maximal value in the data page, i representes the position in the bitmap of each word.
4. the text searching method of use bitmap index as claimed in claim 1; It is characterized in that, be stored in the node table, also further comprise at said all elements node with said XML document; Data line is being stored in the process of said node table; Be each data line index building key assignments of said bitmap index table, be specially: said bitmap index table has a call back function, and said call back function is responsible for each data line index building key assignments of said bitmap index table;
Said call back function is:
IndexKeyRange?IndexBuilder(HeapTuple?htup,Relation?heap,Relation?index);
Wherein, said htup is the data line of said node table, and heap and index are the handle objects of said node table and said bitmap index table, and rreturn value is the object of IndexKeyRange type.
5. the text searching method of use bitmap index as claimed in claim 1 is characterized in that, further comprises: the bitmap to each word in effective set of letters of said structure forms is carried out the RLE compression.
6. the text searching method of use bitmap index as claimed in claim 1 is characterized in that, the described method of utilizing said bitmap index to carry out full-text search is specially:
When xquery fulltext inquires about, wherein querying condition character string is split into plurality of words;
Use the inactive vocabulary of acquiescence to filter above-mentioned plurality of words, obtain effective looking up words set;
The word that uses the bitmap index table to inquire about successively in effective set of letters obtains several bitmaps;
Described some bitmaps are carried out the target logic computing that meets querying condition obtain final bitmap, also promptly finally met the capable TID set of node data of full-text search condition.
7. the text searching method of use bitmap index as claimed in claim 1; It is characterized in that; Said analyzing XML file; All elements node to obtain said XML document is specially: the XML document resolver of said XML document being sent into the SAX pattern is resolved, to obtain all elements node of said XML document.
8. a full-text search device that uses bitmap index is characterized in that, comprising:
The document resolution unit is used for analyzing XML file, obtains all elements node of said XML document;
The node storage unit is used for all elements node of said XML document is stored to a node table;
The text node split cells, all text child nodes that are used to take out said node element, and said text child node split into several words, constitute a set;
Filter element is used for that stop words is carried out in said set and filters, and obtains effective set of letters;
The concordance list construction unit is used to make up the bitmap index table;
Retrieval unit is used to utilize said bitmap index table to carry out full-text search.
9. the full-text search device of use bitmap index as claimed in claim 8; It is characterized in that; Described concordance list construction unit is specially: make up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table, form said bitmap index table.
10. the full-text search device of use bitmap index as claimed in claim 9; It is characterized in that the following formula of described index construct unit by using makes up the bitmap of each word in said effective set of letters according to the mapping relations between the data line of each word in said effective set of letters and said node table:
block=i/M;
offset=i%M;
Wherein, TID is the address of the data line in the node table, expression become (block, offset), M is a storable number of data lines purpose maximal value in the data page, i representes the position in the bitmap of each word.
11. the full-text search device of use bitmap index as claimed in claim 8 is characterized in that, said bitmap index table has a call back function, and said call back function is responsible for each data line index building key assignments of said bitmap index table;
Said call back function is:
IndexKeyRange?IndexBuilder(HeapTuple?htup,Relation?heap,Relation?index);
Wherein, said htup is the data line of said node table, and heap and index are the handle objects of said node table and said bitmap index table, and rreturn value is the object of IndexKeyRange type.
12. the full-text search device of use bitmap index as claimed in claim 8 is characterized in that, further comprises a compression unit, described compression unit is used for the bitmap of each word formation of effective set of letters of said structure is carried out the RLE compression.
13. the full-text search device of use bitmap index as claimed in claim 8 is characterized in that, described retrieval unit further comprises:
The inquiry string split cells is used for when xquery fulltext inquires about, and wherein querying condition character string is split into plurality of words;
The stop words filter element is used to use the inactive vocabulary of acquiescence to filter above-mentioned plurality of words, obtains effective looking up words set;
The bitmap query unit, the word that is used for using the bitmap index table to inquire about effective set of letters successively obtains several bitmaps;
The target bitmap acquiring unit is used for that described some bitmaps are carried out the target logic computing that meets querying condition and obtains final bitmap, is also promptly finally met the capable TID set of node data of full-text search condition.
14. the full-text search device of use bitmap index as claimed in claim 8; It is characterized in that; Described document resolution unit is used for said XML document is sent into the XML document resolver of SAX pattern resolves, to obtain all elements node of said XML document.
CN201210193874.4A 2012-06-12 2012-06-12 Full text retrieval method using bitmap index and device Expired - Fee Related CN102760165B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210193874.4A CN102760165B (en) 2012-06-12 2012-06-12 Full text retrieval method using bitmap index and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210193874.4A CN102760165B (en) 2012-06-12 2012-06-12 Full text retrieval method using bitmap index and device

Publications (2)

Publication Number Publication Date
CN102760165A true CN102760165A (en) 2012-10-31
CN102760165B CN102760165B (en) 2014-09-03

Family

ID=47054622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210193874.4A Expired - Fee Related CN102760165B (en) 2012-06-12 2012-06-12 Full text retrieval method using bitmap index and device

Country Status (1)

Country Link
CN (1) CN102760165B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281584A (en) * 2013-07-02 2015-01-14 北大方正集团有限公司 XML database performance testing system and method
CN104346331A (en) * 2013-07-23 2015-02-11 北大方正集团有限公司 Retrieval method and system for XML database
CN104346332A (en) * 2013-07-23 2015-02-11 北大方正集团有限公司 Full-text retrieval method and system for XML database
CN104572828A (en) * 2014-12-08 2015-04-29 中国科学院信息工程研究所 Auxiliary indexing method and auxiliary indexing system based on space bitmap model
WO2017096892A1 (en) * 2015-12-07 2017-06-15 百度在线网络技术(北京)有限公司 Index construction method, search method, and corresponding device, apparatus, and computer storage medium
CN108182209A (en) * 2017-12-18 2018-06-19 ***通信集团广东有限公司 A kind of data index method and equipment
CN108932738A (en) * 2018-07-03 2018-12-04 南开大学 A kind of bit slice index compression method based on dictionary
CN114547380A (en) * 2022-01-25 2022-05-27 北京元年科技股份有限公司 Data traversal query method and device, electronic equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1605081A (en) * 2001-12-17 2005-04-06 Zih公司 XML printer system
CN101464854A (en) * 2007-12-18 2009-06-24 金宝电子(上海)有限公司 Method for representing character by combination code through XML

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1605081A (en) * 2001-12-17 2005-04-06 Zih公司 XML printer system
CN101464854A (en) * 2007-12-18 2009-06-24 金宝电子(上海)有限公司 Method for representing character by combination code through XML

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281584A (en) * 2013-07-02 2015-01-14 北大方正集团有限公司 XML database performance testing system and method
CN104346331A (en) * 2013-07-23 2015-02-11 北大方正集团有限公司 Retrieval method and system for XML database
CN104346332A (en) * 2013-07-23 2015-02-11 北大方正集团有限公司 Full-text retrieval method and system for XML database
CN104572828A (en) * 2014-12-08 2015-04-29 中国科学院信息工程研究所 Auxiliary indexing method and auxiliary indexing system based on space bitmap model
CN104572828B (en) * 2014-12-08 2018-01-19 中国科学院信息工程研究所 A kind of secondary index method and system based on space bit map model
WO2017096892A1 (en) * 2015-12-07 2017-06-15 百度在线网络技术(北京)有限公司 Index construction method, search method, and corresponding device, apparatus, and computer storage medium
CN108182209A (en) * 2017-12-18 2018-06-19 ***通信集团广东有限公司 A kind of data index method and equipment
CN108932738A (en) * 2018-07-03 2018-12-04 南开大学 A kind of bit slice index compression method based on dictionary
CN108932738B (en) * 2018-07-03 2022-08-16 南开大学 Bit slice index compression method based on dictionary
CN114547380A (en) * 2022-01-25 2022-05-27 北京元年科技股份有限公司 Data traversal query method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN102760165B (en) 2014-09-03

Similar Documents

Publication Publication Date Title
CN102760165B (en) Full text retrieval method using bitmap index and device
CN107402990B (en) Distributed New SQL database system and semi-structured data storage method
CN104915450A (en) HBase-based big data storage and retrieval method and system
CN101739436B (en) XML-based flexible data migration method
CN104750681A (en) Method and device for processing mass data
CN103955538B (en) HBase data persistence and query methods and HBase system
CN104021145A (en) Mixed service concurrent access method and device
WO2014066816A1 (en) Systems and methods for intelligent parallel searching
CN110109910A (en) Data processing method and system, electronic equipment and computer readable storage medium
WO2021179722A1 (en) Sql statement parsing method and system, and computer device and storage medium
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN103646079A (en) Distributed index for graph database searching and parallel generation method of distributed index
CN102819585A (en) Method for controlling document of extensive makeup language (XML) database
CN102955843A (en) Method for realizing multi-key finding of key value database
CN112687364B (en) Medical data management method and system based on Hbase
CN103455335A (en) Multilevel classification Web implementation method
CN103198136A (en) Sequence-association-based query method for personal computer files
US10482087B2 (en) Storage system and method of operating the same
CN101963993B (en) Method for fast searching database sheet table record
CN103473444A (en) Electronic medical record system based on intelligent analyzing data structure and processing method of system
CN106933824A (en) The method and apparatus that the collection of document similar to destination document is determined in multiple documents
JP2011008451A (en) Database cache device using key-value store
CN102768672B (en) A kind of disk space management method and apparatus
CN101916260A (en) Method for establishing semantic mapping between disaster body and relational database
CN102760164A (en) Method for exchanging data between relation database management system and XML (Extensive Makeup Language) database management system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: SHANGHAI FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SH

Effective date: 20130108

Owner name: BEIDA FANGZHENG GROUP CO. LTD.

Free format text: FORMER OWNER: SHANGHAI FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO., LTD.

Effective date: 20130108

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 201203 PUDONG NEW AREA, SHANGHAI TO: 100871 HAIDIAN, BEIJING

TA01 Transfer of patent application right

Effective date of registration: 20130108

Address after: 100871 Beijing, Haidian District into the house road, founder of the building on the 5 floor, No. 298

Applicant after: Peking Founder Group Co., Ltd.

Applicant after: Founder Digital Publishing Technology (Shanghai) Co.,Ltd.

Address before: 201203, No. 608, midsummer Road, Zhangjiang hi tech park, Shanghai, Pudong New Area

Applicant before: Founder Digital Publishing Technology (Shanghai) Co.,Ltd.

ASS Succession or assignment of patent right

Owner name: FOUNDER INFORMATION INDUSTRY HOLDING CO., LTD. FOU

Free format text: FORMER OWNER: FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO., LTD.

Effective date: 20130913

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20130913

Address after: 100871 Beijing, Haidian District into the house road, founder of the building on the 5 floor, No. 298

Applicant after: Peking Founder Group Co., Ltd.

Applicant after: Founder Holdings Company Limited (Founder Holdings)

Applicant after: Founder Digital Publishing Technology (Shanghai) Co.,Ltd.

Address before: 100871 Beijing, Haidian District into the house road, founder of the building on the 5 floor, No. 298

Applicant before: Peking Founder Group Co., Ltd.

Applicant before: Founder Digital Publishing Technology (Shanghai) Co.,Ltd.

C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140903

Termination date: 20170612

CF01 Termination of patent right due to non-payment of annual fee