WO2014094331A1 - 可实现多属性复合条件查询的虚拟表索引机制及方法 - Google Patents

可实现多属性复合条件查询的虚拟表索引机制及方法 Download PDF

Info

Publication number
WO2014094331A1
WO2014094331A1 PCT/CN2012/087667 CN2012087667W WO2014094331A1 WO 2014094331 A1 WO2014094331 A1 WO 2014094331A1 CN 2012087667 W CN2012087667 W CN 2012087667W WO 2014094331 A1 WO2014094331 A1 WO 2014094331A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
index
value
virtual table
sub
Prior art date
Application number
PCT/CN2012/087667
Other languages
English (en)
French (fr)
Inventor
李晓林
谢毅
徐志伟
岳强
Original Assignee
广东电子工业研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东电子工业研究院有限公司 filed Critical 广东电子工业研究院有限公司
Priority to EP12890515.5A priority Critical patent/EP2849089A4/en
Priority to US14/381,214 priority patent/US20150095342A1/en
Publication of WO2014094331A1 publication Critical patent/WO2014094331A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3341Query execution using boolean model

Definitions

  • Virtual table indexing mechanism and method capable of implementing multiple complex conditional query
  • the invention relates to the field of computer application technology, in particular to a virtual table indexing mechanism and method capable of realizing multi-attribute compound conditional query
  • a single database table can not support all the business data, and the big data needs to be divided into thousands of physical sub-tables for block storage and management, and these physical sub-tables are integrated through the data; Form a virtual table with "infinite capacity".
  • a data query calculation for a virtual table may involve instant access to multiple distributed data sources (blocks), a large number of joints based on distributed data sources.
  • the first idea is: for the storage location of the sub-table, it is convenient to quickly locate the physical storage location of the data block sub-table.
  • the storage and access of large-scale data requires storage and management of the data fragmentation. At this time, each data block is created.
  • the indexing mechanism makes it easy to request fast location to the target data source.
  • the second idea is: the primary key value segment index, a sub-table will save a continuous data of a data table according to the primary key value, from the primary key to the end primary key, a 3 ⁇ 4 constitutional table is saved in multiple physical
  • This mechanism in subtables is very efficient for queries that support simple logical operations based on primary keys. Make sure that the query is only applied to the data subtable that meets the criteria for the corresponding key value. But can not support multi-attribute query predicate condition, relatively complex arithmetic logic query
  • Bigtable's Ordered Table storage model uses a hierarchical MetaData model to build an index of a table table. It supports both storage location indexing and key-value segmentation indexing, but can only support interval-based queries based on primary keys. Support for complex compound conditional queries, such as relational databases
  • One of the technical problems solved by the present invention is to construct a virtual table purple indexing mechanism that supports distributed compound conditional query, and solve the problem that the current storage location purple index and key purple purple index cannot support multiple attribute conditions and complex arithmetic logic query.
  • the virtual table concept and technology cited in the present invention is based on the inventor's other patent ZL200810119858.4 (name ; a network system and its management method; the virtual table of the present invention and its management and use methods all refer to the patent
  • the second technical problem solved by the present invention is to construct a virtual table indexing method that supports distributed compound conditional query, and solve the problem that the current storage location index and key index cannot support multiple attribute conditions and complex arithmetic logic queries.
  • the mechanism is composed of three key component index managers, a condition analyzer and a pre-execution engine;
  • the index manager which manages index key values of virtual table attributes, supports the conditional analyzer of multiple attributes of the virtual table, decomposes the query conditions imposed on the consideration table, and analyzes the predicate;
  • the attributes of the purple index are analyzed according to the SQL syntax. After replacing the predicate expression with other attributes with the true value tme, only the severe query predicate condition is left to determine whether the severe index satisfies this condition.
  • the conditional analyzer further calculates the true/false of the predicate about the attribute in the query condition replaced by the true value through the interval value, and Substituting the predicate with a boolean result (tme/&ise); if the step cannot be calculated, it means directly returning the inferred result that the query needs to be executed on the virtual table;
  • the pre-execution engine determines whether the part of the query condition is replaced by a true value expression (trae) and a boolean result by using an index key value of the corresponding attribute, and determines whether the corresponding attribute satisfies the query condition; If it is not satisfied, the painting directly returns the inference result of not performing the query on the virtual table; otherwise, it continues to judge other attributes, and some special cases, the default is to satisfy the execution condition; the index key is only the interval ⁇
  • the technical solution of the present invention solves the above technical problem.
  • the second technical solution is: constructing a key I value for a plurality of attributes on a virtual table; calculating a query condition applied to the virtual table when executing the query;
  • the index in advance, independently determines, for each attribute, whether the physical entity table of the virtual table mapping needs to be executed, thereby ensuring that the query is only applied to the sub-algorithm table satisfying the corresponding key value condition.
  • Determining which sub-tables contain data records satisfying the query condition is; at the pseudo-layer, pre-determining whether the data set of the sub-table satisfies the query condition by using the purple index of the sub-table on each sub-table , only locates the sub-table satisfying the condition, and for the no-query filter condition, the purple table is not established according to a certain attribute key value to establish a sub-table index structure similar to the B+ tree, and each leaf node of the B+ tree represents one Sub-table; establishes an index of one or more (other) attributes on the child table at the leaf node, and the purple index can be a single ⁇ and interval value
  • the replacement of the query condition twice is a syntax tree formed by analyzing the condition of the SQL query, and each analysis attribute is separately analyzed and judged.
  • the virtual table indexing mechanism of the present invention is different from the index of a database table.
  • the index of the database table is to quickly locate the data record, and the purple bow of the virtual table is to quickly determine whether the query with a certain condition has It is necessary to apply it to the virtual table. If it is scheduled for query, improve the efficiency of distributed query.
  • general data table has certain regularity, it often stores certain condition data in a sub-table, such as railway every year.
  • the shipping summary data is stored separately in 12 sub-tables by month to facilitate year-on-year analysis. Therefore, the index of the virtual table is determined from the overall perspective of the child table, rather than the specific one of the records, so the number of purple leads is relatively small.
  • the invention establishes an index of a virtual table multi-attribute in a virtual layer, and predicts whether the query condition is established by an index when the query is decomposed, thereby determining whether it is necessary to locate and query the physical sub-table relative to the "key-value" and the sequence table.
  • the database system of the model only supports interval query based on primary key.
  • the invention can establish index for multiple attributes, support the judgment of compound conditional query for multiple attributes, support multi-attribute query predicate condition, and query of relatively complex operation logic, such as Support for SQL standards for relational databases.
  • the physical sub-table concept described in the present invention is abstractly defined from the perspective of metadata, and does not involve a physical system for storing and managing sub-list data, and therefore can also be referred to as a virtual table without distinction.
  • Figure 1 is a structural view of the system of the present invention
  • FIG. 3 is a core algorithm diagram of the present invention
  • the present invention is on a virtual table, and it is possible to construct a purple key value for a plurality of attributes.
  • the query condition applied to the table is calculated; according to the key index, whether the execution needs to be performed independently (the physical table of the virtual table mapping) is determined in advance for each severity, thereby ensuring that the query is only applied Execute on a sub-virtual table that satisfies the corresponding key-value condition.
  • the present invention supports multi-attribute query predicate conditions, relatively complex arithmetic logic queries, such as SQL standards supporting relational databases.
  • the present invention provides a mechanism and method for supporting a virtual table of distributed queries. It consists of three key components: Index Manager, Condition Analyzer, Pre-Execution Wizard.
  • Index Manager Manages index key values for virtual table attributes, supports a single key for multiple attributes of a virtual table
  • Conditional Analyzer Decomposes the query conditions imposed on the table and analyzes the predicate. In turn, the indexed attribute is analyzed according to the SQL syntax, and the true value toe is used to replace the predicate expression with other severeness, leaving only the query predicate condition of the attribute, so as to judge the severe index. Whether this condition is met.
  • the conditional analyzer also needs to pass the interval value, and then calculate the true/false of the predicate about the attribute in the query condition replaced by the true value, and use the Boolean result (ru e /f3 ⁇ 4s e ) Replace the predicate, if this step cannot be calculated, it means directly returning the inferred result of executing the query on the virtual table.
  • the pre-execution engine judges the true/false of the query condition after the partial replacement of the real expression (irae) and the boolean result by the correspondingly strong index key value (the interval value only takes the start and end values). Determine whether the purple index of the corresponding attribute satisfies the query condition. If it is not satisfied, it directly returns the inference result of not performing the query on the virtual table; otherwise, it continues to judge other attributes. In some special cases, for the judgment of the cylinder, the pre-execution engine defaults to satisfy the execution conditions, such as a multi-predicate expression.
  • each leaf node of the tree represents -
  • a sub-table establishes an index of one or more (other) attributes on the child table at the leaf node.
  • the index value can be a single value and an interval.
  • the strong index value is a value interval defined from the overall perspective of the child table, between the index values. The smaller the range (less), the easier it is to judge.
  • the column suitable for indexing in the virtual layer is a column with high base and low selectivity, and the index of the database eyebrow is just the opposite for a specific record of the database (base definition; number of rows/unique value; definition of choice; :1 / The only number of values).
  • base definition number of rows/unique value; definition of choice; :1 / The only number of values.
  • the index value varies with data enhancement/modification/deletion, sub-table split/merge, etc.
  • FIG 3 shows the core algorithm of the present invention
  • the replacement of the query condition twice in the core algorithm (Step2 Step3) is the key step through the syntax tree formed by the analysis of the SQL query condition, and the algorithm is performed separately for each index attribute.
  • Analysis and Judgment Assume that in the railway freight schedule, the data sub-table is managed according to the month (Mcrath) and line (Line) number (the purple index is as described in the following table. When there are too many sub-tables, you can also follow Country 2 builds an index tree with Une values).
  • Step2 For T, the table, first determine the Mont index.
  • the result of executing Step2 is: month > -20121008 and month ⁇ 20121014 and ime.
  • the same condition for handling m mil ⁇ 20121014 ring is: true and true and true, Step4 calculates trap ⁇ resuli as true, thus
  • Step2 the condition after executing Step2 is: true and true and !i e-0005 , directly skip Step3 and execute Step4 to determine whether the condition is established by 0005 ⁇ .
  • the pre-execution engine is sent to the in-memory database engine.
  • - Query "sekct line from (select 0005 as line) IT where ime and true and line-0005 "Get a non-empty result set to determine the condition is met, thus ensuring complex query conditions for any SQL standard containing Lme attributes Judging the result is actually for multiple single-valued indexes, you can combine them together, and only perform Step4 once to determine whether there is a key value to satisfy the condition.
  • the final judgment result is that the table needs to be executed.
  • the difficulty here is in Stq>3, because it is impossible to judge the true and false of the predicate by means of the exhaustive value of the interval (the actual database is judged by the value of the specific record column), and can only be substituted into the interval.
  • the value converts the predicate expression into a comparative logic to calculate the Boolean result, which is equivalent to solving the unary equation.
  • only a relatively simple first-order predicate company is solved, and the others are irreversible.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及计算机应用技术领域,尤其是一种可实现多属性复合条件査询的虚拟表索引机制及方法。本发明由三个关键部件索引管理器、条件分析器和预执行引擎构成;本发明在虚拟表上构建针对多个属性的索引键值;在执行査询时,计算施加到该虚拟表的查询条件;依据键值索引,预先针对各个属性独立判断是否需要执行该虚拟表映射的物理实体表,从而确保将查询只施加到满足相应键值条件的子虚拟表上执行。本发明有效解决了虚拟表多属性的复合条件査询;可以用于虚拟表索引中。

Description

可实现多厲性复合条件査询的虛拟表索引机制及方法
技术领域
本发明涉及计算机应用技术领域, 尤其是一种可实现多属性复合条件査询 的虛拟表索引机制及方法
背景技术
随着 inieraei上应用的数据规模迅速增长,单一数据库表往往无法支撑所有 业务数据, 需要将大数据分成若千物理子表分块存储和管理, 通过数据中;间件 将这些物理子表整合起来形成一张 "容量无限" 的虚拟表。 而随着网络应用处 理和计算变得越来越复杂, 一次针对虚拟表的数据査询计算可能涉及到对多个 分布数据源 (块) 的即时访问, 这种大量基于分布数据源上的联合査询, 由于 受数据规模、 査询复杂度、 传输带宽等因素的影响, 访问性能常常是这类应用 的瓶颈; 因此, 针对这类应用模式的査询如何在虛拟层构建索引机制, 快速定 位子表同时避免不必要的子表査询是解决査询性能问题的关键之一。
从技术方法层面, 主要有两种思路实现针对多个分布数据源 (块) 查询的 索引机制, 来提高访问的性能
思路一是: 针对子表的存储位置紫引, 方便快速定位数据块子表的物理存 储位置 大规模数据的存储与访问需要对数据分片分块的存储和管理, 此时建 立每个数据块的索引机制能方便请求快速定位到目标数据源上。
思路二是: 主键键值分段索引, 一张子表会保存一个数据表里面按照主键 键值的某段连续的数据, 从幵始主键到结束主键, 一 ¾宪整的表格是保存在多 个物理子表中 这种机制对支持基于主键的简单逻辑运算的査询非常有效, 能 确保将查询只施加到满足相应键值条件的数据子表上。 但是不能支持多属性査 询谓词条件、 相对复杂的运算逻辑的查询
如 Bigtable这种顾序表(Ordered Table)存储模型采用了层次化的 MetaData 模型建立 tablet表的索引, 既支持存储位置索引, 又支持键值分段索引, 但只能 支持基于主键的区间査询, 不能支持多厲性的复合条件査询, 如关系数据库的
发明内容
本发明解决的技术问题之一在于提出构建支持分布式复合条件査询的虚拟 表紫引机制, 解决当前的存储位置紫引和键値紫引不能支持多属性条件、 复杂 运算逻辑査询的问题。 本发明引用的虛拟表概念和技术是基于本发明的发明人 的另 项专利 ZL200810119858.4 (名称 ; 一种网络***及其管理方法 ; 本发明 所述虛拟表及其管理和使用方式均引用该专利
本发明解决的技术问题之二在于提出构建支持分布式复合条件查询的虛拟 表索引方法, 解决当前的存储位置索引和键值索引不能支持多属性条件、 复杂 运算逻辑査询的问题。
本发明解决上述技术问题之一的技术方案是; 所述的机制由三个关键部件 索引管理器、 条件分析器和预执行引擎构成;
所述的索引管理器, 管理虚拟表属性的索引键值, 支持虚拟表多个属性的 所述的条件分析器, 对施加在虑拟表上的査询条件分解并对谓词分析; 依 次对有紫引的属性按照 SQL语法分析整个查询条件, 用真值 tme替换屏蔽掉含 其他属性的谓词表达式后, 只留下该厲性的査询谓词条件, 以便判断该厲性索 引是否满足此条件; 此时, 如果某属性的索引键值是区间, 条件分析器进一歩 通过该区间值再计算被真值替换后的査询条件中关于该属性的谓词的真 /假, 并 用布尔结果(tme/&ise)替换该谓词; 如果该步骤无法计算, 则表示直接返回需 要对该虛拟表执行査询的推断结果;
所述的预执行引擎,通过相应属性的索引键值判断部分被真值表达式(trae) 和布尔结果替换后的查询条件的真 /假,确定相应属性的索弓「是否满足查询条件; 如果不满足, 画直接返回不对该虚拟表执行査询的推断结果; 否则, 继续判断 其他属性索弓 h —些特殊情况, 默认为满足执行条件; 索引键値为区间値时只
本发明解决上述技术问题之.二的技术方案是: 在虚拟表上构建针对多个属 性的索弓 I键值; 在执行査询时, 计算施加到该虛拟表的査询条件; 依据键値索 引, 预先针对各个属性独立判断是否需要执行该虚拟表映射的物理实体表, 从 而确保将査询只施加到满足相应键值条件的子虑拟表上执行。
所述的预先确定哪些子表含有满足该査询条件的数据记录是; 在處拟层, 通过在各个子表上基于子表的紫引, 预先判断该子表的数据集合是否满足査询 条件, 只定位到满足条件的子表迸行査询; 而对于无査询过滤条件、 未建立紫 按照某个属性键值建立类似 B+树的子表索引结构, B+树的每个叶子节点代 表一个子表; 在叶子节点对子表建立 1 个或多个 (其他) 属性的索引, 紫引值 可以是单値和区间值
两次对查询条件的替换是通过对 SQL查询条件分析形成的语法树, 针对每 个索引属性, 进行单独分析判断
对区间内的值通过代入区间值将谓词表达式转化成比较逻辑计算布尔结 果 o
本发明的虛拟表索引机制不同于数据库表的索引。 数据库表的索引是为了 快速定位数据记录, 而虛拟表的紫弓 ί是为快速确定带有某个条件的査询是否有 必要施加在该虛拟表上, 要是为査询调度, 提高分布式査询效率《 由于一般 数据分表都具有确定的规则性, 常常将满足一定条件数据存储到一个子表中, 比如铁路每年的货运摘要数据是按照月份分开存储到 12张子表中, 以方便同比 分析。 因此, 虚拟表的索引是从子表的整体角度来确定索引的, 而不是针对特 定的一条记录建索弓 i, 因此紫引数量相对很少。
本发明通过在虚拟层建立虛拟表多属性的索引, 在查询分解时, 通过索引 对查询条件预判断是否成立, 从而确定是否需要定位和査询物理子表 相对于 "键-值"和顺序表模型的数据库***只支持基于主键的区间查询, 本发明可以 针对多属性建立索引, 支持对多属性的复合条件査询的判断; 支持多属性査询 谓词条件、 相对复杂的运算逻辑的查询, 如支持关系数据库的 SQL标准。
本发明所述的物理子表概念是从元数据角度抽象定义的, 不涉及对子表数 据存储和管理的物理***, 因此也可以不加区别的称为一种虚拟表。
附图说明
下面结合附图对本发明迸一歩说明:
图 1是本发明***结构图;
图 3是本发明的核心算法图
具体实施方式
本发明是在虚拟表上, 可以构建针对多个属性的紫引键值。 在执行査询时, 计算施加到该處拟表的査询条件; 依据键值索引, 预先针对各个厲性独立判断 是否需要执行 (该虚拟表映射的物理实体表), 从而确保将查询只施加到满足相 应键值条件的子虛拟表上执行。 相对现有的技术, 本发明支持多属性查询谓词 条件、 相对复杂的运算逻辑的査询, 如支持关系数据库的 SQL标准
为实现上述目的, 本发明一种支持分布式査询的虚拟表索 的机制及方法, 由三个关键部件构成: 索引管理器、 条件分析器、 预执行弓 i擎。
索引管理器: 管理虚拟表属性的索引键值, 支持虚拟表多个属性的单个键
条件分析器: 对施加在處拟表上的査询条件分解并对谓词分析。 依次对有 索引的属性, 按照 SQL语法分析整个査询条件, 用真值 toe替换屏蔽掉含其他 厲性的谓词表达式后, 只留下该属性的査询谓词条件, 以便判断该厲性索引是 否满足此条件。 此时, 如果某属性的索引键値是区间, 条件分析器还需要通过 该区间值, 再计算被真值替换后的査询条件中关于该属性的谓词的真 /假, 并用 布尔结果( rue/f¾se) 替换该谓词, 如果该步骤无法计算, 则表示直接返回露要 对该虛拟表执行査询的推断结果。
预执行引擎; 通过相应厲性的索引键值 (区间值只取幵始和结柬值即可) 判断部分被真値表达式(irae )和布尔结果替换后的査询条件的真 /假, 确定相应 属性的紫引是否满足查询条件。 如果不满足, 则直接返回不对该虚拟表执行査 询的推断结果; 否则, 继续判断其他属性紫引。 一些特殊情况, 为筒化判断, 预执行引擎默认为满足执行条件, 比如含多元谓词表达式。
如图 1 所示, 由于数据规模大导致分的数据子表较多, 一次应用的査询请 求, 不能全部施加到各个子表执行, 需要预先确定哪些子表含有满足该查询条 件的数据记录 (区间结果或者单条记录)。 在虚拟层, 通过在各个子表上 ( Ί^. , ,)基于子表的索引, 预先判断该子表的数据集合是否满足査询条件, 只定位到满足条件的子表进行査询 (了,,... )。 而对于一些特例 (包括无査询过 如图 2所示, 为快速定位子表, ^以按照某个属性键値建立类似 Β+树的子 表索引结构, Β+树的每个叶子节点代表 -个子表 在叶子节点对子表建立 1个 或多个 (其他) 属性的索引, 索引值可以是单值和区间 1 厲性索引值是从子 表的整体角度定义的值区间, 索引值 间范围越小 (少), 越容易判断。 所¾在 虛拟层适合建立索引的列是基数高、 选择度低的列, 而在数据库眉的索引为便 于定俊某条具体记录正好相反 (基数定义; 行数 /唯一值数; 选择度定义; :1 /唯 一值数)。 考虑到不同应用场景中, 索引值会随着数据增力 /修改 /删除、 子表的 拆分 /合并等情况而变化, 这种变化的频度和复杂度在各种场景中不一致, 本发 明不强调紫引值的一致性同歩方法 可以通过人工建立索引、 也可以在数据操 由于在 GAV (Global As View ) /LAV (Local As View)这类传统的数据集成 应用中, 虛拟视图的 schema与物理数据源的 schema不尽相同, 是通过映射机 制来确定。 因此, 针对每个子表都需要对査询条件进行重新分析, 以将应翔的 查询映射到子表的查询。 而对于一般的"键-值"、顺序表数据模型, 由于各子表 的厲性是一致的, 不需要针对每个子表分别做条件分析 因此本发明的方法可 以同时应用到 "键 1"、 顺序表、 传统关系数据库集成领域。
图 3 表示了本发明的核心算法; 核心算法中两次对査询条件的替换 ( Step2 Step3 ) 是关键歩骤 通过对 SQL査询条件分析形成的语法树, 算法针 对每个索引属性, 进行单独分析判断 假设铁路货运明细表中, 数据分表是按 照月份(Mcrath)和线路(Line)编号来分若千子表管理的(其紫引如下表所述。 当子表太多时, 也可以按照国 2以 Une值建立索引树)。
Figure imgf000008_0002
Figure imgf000008_0001
隱 ith >20 I2 ! 008 and month < 2012.1014 and line = 0005,
对 T,表, 首先判断 Mont 索引。 执行 Step2结果是: month > -20121008 and month < 20121014 and ime。 执行 Step3时, 对于 month >- 20121008谓词, 首先 替换 index 中的通配符号 index 成 2100卜 2012腦, 通过转化谓词成计算 2012100! <: 20121008 and 2012 j 008 <= 2012】03 的成果为 true。同样处理 m mil < 20121014環词后条件为: true and true and true, Step4计算 trap^resuli结果为 true, 从而
Figure imgf000009_0001
继续判断 TV表的 Line索引, 此时执行 Step2后条件为: true and true and !i e-0005 , 直接跳过 Step3执行 Step4通过 0005値判断该条件是否成立 预执 行引擎是通过向内存数据库引擎发送 -个査询 "sekct line from (select 0005 as line) IT where ime and true and line-0005 "获得非空结果集从而判断条件成立, 这样确保含有 Lme属性的任意 SQL标准的复杂査询条件都能判断出结果 其实 对于多个单值索引, 可以将他们合并在一起, 只执行一次 Step4即可判断是否有 键值满足条件 最终判断结果是需要执行 表。
而对 表, 在 Month索引判断 Step3后条件成 fidse and false and true, 计算 tmp resu it结果 false,■ 从而 result e, 不 1 !f T; :表.
这里的难点在 Stq>3, 由于不可能对区间内的值迸行穷举的办法来判断谓词 的真假 (实际数据库中是通过代入具体记录列的值判断真假), 只能通过代入区 间值将谓词表达忒转化成比较逻辑计算布尔结果, 相当于解一元方程式, 考虑 到复杂性目前只实现了相对简单的一阶谓词公司的求解, 其他的均. 作无法替 换处理。

Claims

权 利 要 求 书
1、 种可实现多属性复合条件查询的虚拟表索引机制, 其特征在于: 所述 的机制由≡个关键部件索引管理器、 条件分析器和预执行弓 I擎构成;
所述的索引管理器, 管理虚拟表属性的索引键値, 支持虛拟表多个属性的 单个键值、 区间键值的管理;
所述的条件分析器, 对施加在虚拟表上的查询条件分解并对谓词分析; 依 次对有索引的厲性按照 SQL语法分析整个査询条件, 用真值 trae替换屏蔽掉含 其他属性的谓词表达式后, 只留下该属性的査询谓词条件, 以便判断该属性索 引是否满足此条件; 此时, 如果某属性的紫引键值是区间, 条件分析器进一歩 通过该区间值再计算被真值替换后的查询条件中关于该属性的谓词的真 /假, 并 用布尔结果(tme/feke) 替换该谓词; 如果该步骤无法计算, 则表示直接返回需 耍对该虚拟表执行査询的推断结果;
所述的预执行引擎,通过相应属性的索引键值判断部分被真値表达式 (true) 和布尔结果替换后的査询条件的真 /假,确定相应属性的索弓 I是否满足査询条件 ; 如果不满足, 则直接返回不对该虛拟表执行査询的推断结果; 否则, 继续判断 其他属性索引; 一些特殊情况, 默认为满足执行条件; 索引键值为区间值时只 取幵始和结束值。
2、 一种可实现多属性复合条件查询的虚拟表紫引方法, 其特征在于: 在虚 拟表上构建针对多个属性的索引键值; 在执行査询时, 计算施加到该虛拟表的 査询条件; 依据键值紫引, 预先针对各个厲性独立判断是否需要执行该虚拟表 映射的物理实体表, 从而确保将査询只施加到满足相应键值条件的子虚拟表上 执行
3 , 根据权利要求 2所述的虚拟表索引方法, 其特征在于; 所述的预先确定 哪些子表含有满足该査询条件的数据记录是; 在虚拟层, 通过在各个子表上基 于子表的索弓!, 预先判断该子表的数据集合是否满足査询条件, 只定位到满足 条件的子表进行査询; 而对于无査询过滤条件、 来建立索引或者基于索引难于 判断等特例, 直接对子表施加査询。
4、 根据权利要求 2所述的虚拟表索引方法, 其特征在于: 按照某个属性键 值建立类似 B+树的子表索弓!结构, B+树的每个叶子节点代表一个子表; 在叶子 节点对子表建立 1个或多个(其他) 属性的紫引, 紫弓 I值可以是单值和区间值。
5、 根据权利要求 3 所述的虛拟表索引方法; 其特征在于: 按照某个属性键 值建立类似 B+树的子表索弓 i结构, B+树的每个叶子节点代表一个子表; 在叶子 节点对子表建立 1个或多个 (其他) 属性的索引, 紫引值可以是单值和区间值。
6、 根据权利要求 2至 5任一项所述的虚拟表索引方法, 其特征在于: 两次 对査询条件的替换是通过对 SQL査询条件分析形成的语法树, 针对每个索引属 性, 进行单独分析判断。
7、 根据权利要求 2至 5任一项所述的虛拟表索引方法, 其特征在于: 对区 间内的值通过代入区间值将谓词表达式转化成比较逻辑计算布尔结果。
8、、 根据权利要求 6所述的虚拟表索引方法, 其特征在于: 对区间内的值通 过代入区间值将谓词表达式转化成比较逻辑计算布尔结果。
PCT/CN2012/087667 2012-12-19 2012-12-27 可实现多属性复合条件查询的虚拟表索引机制及方法 WO2014094331A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP12890515.5A EP2849089A4 (en) 2012-12-19 2012-12-27 VIRTUAL TABLE INDEXING MECHANISM AND METHOD CAPABLE OF REALIZING A MULTI-ATTRIBUTE CONDITIONAL REQUEST
US14/381,214 US20150095342A1 (en) 2012-12-19 2012-12-27 Virtual table index mechanism and method for multi-attribute compound condition query

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201210553946.1 2012-12-19
CN201210553946.1A CN103049521B (zh) 2012-12-19 2012-12-19 可实现多属性复合条件查询的虚拟表索引***及方法

Publications (1)

Publication Number Publication Date
WO2014094331A1 true WO2014094331A1 (zh) 2014-06-26

Family

ID=48062162

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/087667 WO2014094331A1 (zh) 2012-12-19 2012-12-27 可实现多属性复合条件查询的虚拟表索引机制及方法

Country Status (4)

Country Link
US (1) US20150095342A1 (zh)
EP (1) EP2849089A4 (zh)
CN (1) CN103049521B (zh)
WO (1) WO2014094331A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782195A (zh) * 2020-06-30 2020-10-16 广州云徙科技有限公司 一种基于在请求参数上添加注解拼接成sql的查询方法

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246749B (zh) * 2013-05-24 2018-08-03 北京立新盈企大数据技术股份有限公司 面向分布式计算的矩阵数据库***及其查询方法
CN104090962B (zh) * 2014-07-14 2017-03-29 西北工业大学 面向海量分布式数据库的嵌套查询方法
US11231840B1 (en) 2014-10-05 2022-01-25 Splunk Inc. Statistics chart row mode drill down
US10139997B2 (en) 2014-10-05 2018-11-27 Splunk Inc. Statistics time chart interface cell mode drill down
CN104331517A (zh) * 2014-11-26 2015-02-04 北京优特捷信息技术有限公司 一种检索方法及装置
CN104866603B (zh) * 2015-06-01 2018-09-25 明算科技(北京)股份有限公司 元数据的调用方法和元数据管理***
CN105045848B (zh) * 2015-06-30 2019-01-29 四川长虹电器股份有限公司 一种支持布尔表达式查询的数据库管理***
CN106980514B (zh) * 2016-01-19 2020-08-07 阿里巴巴集团控股有限公司 配置数据的更新方法和装置
CN108460048B (zh) * 2017-02-21 2022-05-10 阿里巴巴集团控股有限公司 一种查询唯一值的方法及设备
CN111259046B (zh) * 2020-01-17 2023-02-28 深圳市魔数智擎人工智能有限公司 一种并行化sql自动生成方法
CN113419801B (zh) * 2021-06-16 2022-12-27 中移(杭州)信息技术有限公司 表单渲染方法、终端、装置和计算机可读存储介质
CN115563116A (zh) * 2022-10-11 2023-01-03 北京奥星贝斯科技有限公司 一种数据库表扫描方法、装置以及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1556482A (zh) * 2003-12-31 2004-12-22 中兴通讯股份有限公司 一种用于实现数据库多表查询的数据处理方法
CN101174267A (zh) * 2006-10-30 2008-05-07 国际商业机器公司 集成数据库的***、方法和程序产品
US20100153409A1 (en) * 2008-12-12 2010-06-17 Joshi Mahesh V Computer-Implemented Systems And Methods For Providing Paginated Search Results From A Database

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5664172A (en) * 1994-07-19 1997-09-02 Oracle Corporation Range-based query optimizer
US6560593B1 (en) * 1999-07-20 2003-05-06 Computer Associates Think, Inc. Method and apparatus for viewing the effect of changes to an index for a database table on an optimization plan for a database query
CN101187937A (zh) * 2007-10-30 2008-05-28 北京航空航天大学 网格环境下模式复用的异构数据库访问和集成方法
CN101360123B (zh) * 2008-09-12 2011-05-11 中国科学院计算技术研究所 一种网络***及其管理方法
TWI514147B (zh) * 2009-01-23 2015-12-21 Infortrend Technology Inc 執行儲存虛擬化之儲存子系統及儲存系統架構及其方法
US8515945B2 (en) * 2010-11-16 2013-08-20 Sybase, Inc. Parallel partitioning index scan

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1556482A (zh) * 2003-12-31 2004-12-22 中兴通讯股份有限公司 一种用于实现数据库多表查询的数据处理方法
CN101174267A (zh) * 2006-10-30 2008-05-07 国际商业机器公司 集成数据库的***、方法和程序产品
US20100153409A1 (en) * 2008-12-12 2010-06-17 Joshi Mahesh V Computer-Implemented Systems And Methods For Providing Paginated Search Results From A Database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2849089A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782195A (zh) * 2020-06-30 2020-10-16 广州云徙科技有限公司 一种基于在请求参数上添加注解拼接成sql的查询方法
CN111782195B (zh) * 2020-06-30 2024-05-03 广州云徙科技有限公司 一种基于在请求参数上添加注解拼接成sql的查询方法

Also Published As

Publication number Publication date
CN103049521A (zh) 2013-04-17
CN103049521B (zh) 2015-11-11
EP2849089A4 (en) 2016-02-17
US20150095342A1 (en) 2015-04-02
EP2849089A1 (en) 2015-03-18

Similar Documents

Publication Publication Date Title
WO2014094331A1 (zh) 可实现多属性复合条件查询的虚拟表索引机制及方法
CN109299102B (zh) 一种基于Elastcisearch的HBase二级索引***及方法
US10606834B2 (en) Methods and apparatus of shared expression evaluation across RDBMS and storage layer
JP6964384B2 (ja) 異種データソース混在環境におけるフィールド間の関係性の自動的発見のための方法、プログラム、および、システム
US8935232B2 (en) Query execution systems and methods
Khurana et al. Storing and analyzing historical graph data at scale
US9870382B2 (en) Data encoding and corresponding data structure
EP2743845A1 (en) Graph traversal operator inside a column store
US9569492B2 (en) Coalescing operation for query processing
CN106777108A (zh) 一种基于混合存储架构的数据查询方法和装置
JP2014225260A (ja) 分散型データ記憶を管理するデータベース制御部、方法及びプログラム
US9524305B2 (en) Boolean term conversion for null-tolerant disjunctive predicates
Giannakouris et al. MuSQLE: Distributed SQL query execution over multiple engine environments
US9734176B2 (en) Index merge ordering
GB2534374A (en) Distributed System with accelerator-created containers
US9405801B2 (en) Processing a data stream
US9208234B2 (en) Database row access control
WO2018090557A1 (zh) 查询数据表的方法和装置
US11556571B2 (en) Phrase indexing
US20230401199A1 (en) Query execution using materialized tables
Feng et al. SPARQL query parallel processing: a survey
US10877967B1 (en) System and method for disjunctive joins
Nidzwetzki et al. BBoxDB streams: scalable processing of multi-dimensional data streams
US8818987B2 (en) Converting union commands to union all commands
Oh et al. Job-optimized map-side join processing using mapreduce and hbase with abstract RDF data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12890515

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14381214

Country of ref document: US

REEP Request for entry into the european phase

Ref document number: 2012890515

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2012890515

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE