CN115292362A - Data query method based on two-dimensional space - Google Patents

Data query method based on two-dimensional space Download PDF

Info

Publication number
CN115292362A
CN115292362A CN202210794577.9A CN202210794577A CN115292362A CN 115292362 A CN115292362 A CN 115292362A CN 202210794577 A CN202210794577 A CN 202210794577A CN 115292362 A CN115292362 A CN 115292362A
Authority
CN
China
Prior art keywords
data points
convex hull
determining
reduced
regret
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210794577.9A
Other languages
Chinese (zh)
Inventor
谢珉
王尧舒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Computing Sciences
Original Assignee
Shenzhen Institute of Computing Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Computing Sciences filed Critical Shenzhen Institute of Computing Sciences
Priority to CN202210794577.9A priority Critical patent/CN115292362A/en
Priority to PCT/CN2022/104838 priority patent/WO2024007350A1/en
Publication of CN115292362A publication Critical patent/CN115292362A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data query method based on a two-dimensional space, which comprises the steps of obtaining a plurality of data points in a database, and generating a convex hull according to a preset rule and the data points; determining a reduction ratio according to a regret index input by a user, and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction ratio; determining a result set according to a maximum visual area of a plurality of the data points, wherein the visual area of the result set surrounds the convex hull; determining a simplest set comprising the fewest data points from the result set. By pre-constructing the convex hull, the convex hull does not need to be reconstructed when a user calls different k-regret queries on the same data set every time, and the query efficiency is greatly improved. By storing the poles clockwise in a linked list, the visible area of each data point can be quickly calculated.

Description

Data query method based on two-dimensional space
Technical Field
The application relates to the field of data query, in particular to a data query method based on a two-dimensional space.
Background
In recent years, databases have become increasingly large. Often, thousands of products are stored in a database. However, the user is not interested in all products in the database when actually accessing the database. Instead, they are only dedicated to obtaining a small number of products in the database that meet their needs. Because of the large number of products in the database, we cannot require users to traverse the entire database to locate the product of their interest. Therefore, it is desirable to provide a convenient query method in a modern database, which does not require a user to spend a lot of effort on traversing data, and can also find a product in which the user is interested as accurately as possible.
The difficulty of this problem is that the user's needs are complex and diverse, and most users cannot accurately describe his needs. For example, assuming user a accesses a database of used cars, each used car may be described by a number of attributes, such as price, horsepower, age and mileage, etc. User a wants to select an inexpensive but relatively new used vehicle in the database. In other words, the user cares only about the price attribute and the age attribute of the car, and does not care about the attributes such as horsepower. Because the user is only interested in two attributes, such a problem is described as a query problem in two-dimensional space. However, even if the user is concerned with only two attributes of price and age, the emphasis of different users may be different for the price and age of the used vehicle. Some users want the cheaper the better, and some users want to spend more money on a new car. The user's emphasis on the trade-off between two attributes often exists abstractly in the user's mind. The database system cannot obtain an accurate description of the user's preferences, so that it is difficult to accurately find the product of interest to the user in the database of the two-dimensional space.
In order to accurately find a product of interest of a user in a database of a two-dimensional space. Researchers have proposed various database query methods. Conventional queries include: top-k queries and skyline queries. For top-k queries, users need to explicitly indicate their preferences. For example, in the used car database, price and age are two attributes of interest to the user, and the user needs to explicitly indicate that the price is 40% of their preference and the age is 60% of their preference. Based on this particular preference, the database may calculate a particular score for all used cars. By ranking the scores, the top k used cars with the highest score will be returned as output to the user. The disadvantage is that it requires the user to explicitly give their preferences, i.e. how much each attribute is in their weight, when they are in use. Such requirements are very strict, and very few users can clearly give their preferences in practical use, and some users and even users cannot accurately describe their own preferences.
In contrast, skyline queries do not require the user to indicate any preferences, and use a concept called "dominance" to retrieve products from the database and return them to the user. In particular, if one vehicle p is better than another vehicle q in all attributes, we say that p dominates q. For example, car p is both cheaper and newer than car q, and then p dominates q. For user a who only cares about the price and age of used cars, he obviously likes car p more than car q. In a skyline query, products that are not dominated by any other product are returned as results. But it has the disadvantage of a large output size. Although in skyline queries, they take advantage of the concept of "dominance," excluding products that are dominated by other products in the database. However, for the rest of the products, the skyline query has no mechanism for further screening and will be returned to the user in its entirety. Thus, in the worst case where the "dominant" concept is simply unable to exclude products, the skyline query may return the entire database to the user with the goal of helping the user find the product of interest accurately.
Disclosure of Invention
In view of the problems, the present application is proposed to provide a two-dimensional space-based data query method that overcomes or at least partially solves the problems, comprising:
a two-dimensional space-based data query method for querying a simplest set from a number of result sets that satisfy a user-given remorse index, the method comprising:
acquiring a plurality of data points in a database, and generating a convex hull according to a preset rule and the data points;
determining a reduction ratio according to a regret index input by a user, and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction ratio;
determining a result set according to a maximum visual area of a plurality of the data points, wherein the visual area of the result set surrounds the convex hull;
determining a simplest set comprising the fewest data points from the result set.
Further, the step of determining the reduction ratio according to the repentance index input by the user includes:
determining a reduction proportion according to the regret index;
narrowing the data points according to the narrowing proportion to generate narrowed data points;
and reducing the convex hull according to the reduction proportion to generate a reduced convex hull.
Further, the step of determining a plurality of data points corresponding to a maximum visible area of the convex hull according to the reduction ratio comprises:
storing the poles of the reduced convex hulls in the clockwise direction to generate a linked list;
determining a first pole and a second pole which form the largest included angle between the linked list and the reduced data points;
determining the maximum visible area of the reduced data point corresponding to the reduced convex hull according to the first pole and the second pole.
Further, before the step of storing the poles of the reduced convex hull clockwise and generating the linked list, the method further includes:
when the reduced data point is inside the reduced convex hull, then the maximum viewable area of the reduced data point corresponding to the reduced convex hull is zero;
or the like, or a combination thereof,
and when the reduced data points are outside the reduced convex hull, clockwise storing the poles of the reduced convex hull to generate a linked list.
Further, the step of determining a result set according to a maximum visual area of the data points, wherein the visual area of the result set encloses the convex hull, comprises:
determining a key data point of the number of reduced data points, wherein a viewable area of the set of key data points encompasses the reduced convex hull;
determining the result set from the set of key data points.
Further, the step of determining a simplest set comprising the fewest data points from the result set comprises:
determining the simplest set of the result sets that includes the fewest key data points.
Further, still include:
and when the user updates the regret index, narrowing the convex hull and the data points according to the updated regret index.
A two-dimensional space-based data query apparatus for querying a simplest set from among a plurality of result sets satisfying a user-given regret index, the apparatus comprising:
the preprocessing module is used for acquiring a plurality of data points in a database and generating a convex hull according to a preset rule and the data points;
the reduction module is used for determining a reduction proportion according to a repentance index input by a user and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction proportion;
a calculation module configured to determine a result set according to a maximum visible area of the plurality of data points, wherein the visible area of the result set surrounds the convex hull;
and the output module is used for determining a simplest set containing the fewest data points according to the result set.
A computer device comprising a processor, a memory and a computer program stored on the memory and capable of running on the processor, the computer program when executed by the processor implementing the steps of a two-dimensional space-based data query method as described above.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a two-dimensional space-based data query method as described above.
The application has the following advantages:
in the embodiment of the application, a convex hull is generated by acquiring a plurality of data points in a database and according to a preset rule and the data points; determining a reduction ratio according to a regret index input by a user, and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction ratio; determining a result set according to a maximum visual area of a plurality of the data points, wherein the visual area of the result set surrounds the convex hull; and determining a simplest set comprising the fewest data points according to the result set. By pre-constructing the convex hull, the convex hull does not need to be reconstructed when a user calls different k-regret queries on the same data set every time, and the query efficiency is greatly improved. By storing the poles clockwise in a linked list, the visible area of each data point can be quickly calculated.
Drawings
In order to more clearly illustrate the technical solutions of the present application, the drawings required to be used in the description of the present application will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.
Fig. 1 is a flowchart illustrating steps of a data query method based on two-dimensional space according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating another step of a method for querying data based on two-dimensional space according to an embodiment of the present application;
FIG. 3 is a diagram illustrating a maximum visible area of a convex hull corresponding to a data point according to an embodiment of the present application;
fig. 4 is a block diagram illustrating a two-dimensional space-based data query apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
It should be noted that, in any embodiment of the present invention, aiming at the disadvantages of top-k query and skyline query, a new query mode in a product database, namely k-regret query, is proposed by a scholars, which not only retains the advantages of the conventional method, but also overcomes the disadvantages of top-k and skyline query, unlike top-k query, k-regret query does not require a user to provide preference information and can also ensure a controllable output size; unlike skyline queries, k-regret queries can guarantee output size without returning the entire data set to the user. Specifically, in the k-regret query, the degree of satisfaction of the user and the quality of the recommended product are described by the quantization index "regret index". Assuming that the database recommends a subset of products to the user, if the user's score for the most satisfactory product in the subset is x% of his score for the most satisfactory product in the entire database, we say that the user's regret index for this subset is (100-x)%. The lower the regret index, the more satisfied the user is with the recommended product. In the k-regret query, the user gives an acceptable minimum regret index α, and then the database finds the fewest products in the database based on the regret index α, k-regret query, so that the regret index of the user for these products is not higher than α. The invention aims to provide an efficient query technology for k-regret query in a two-dimensional space, so that the use effect of the k-regret query is improved. Especially when different k-regret queries are executed in the same database, the previous operation results can be fully utilized, a large amount of repeated operations are avoided, and the efficiency of data query is improved.
The method and the device are based on k-regret query, and the pole is stored clockwise in a linked list mode aiming at two-dimensional space design, so that the calculation of the visual area can be greatly accelerated. Secondly, by means of pre-constructing the convex hull, the pre-constructed operation result is fully utilized, a large amount of repeated operation is avoided, and the efficiency of performing different k-regret queries on the same data set is greatly improved. Finally, the method and the device can ensure that the repentance index of the user does not exceed alpha, and the number of output products is not higher than the logarithmic multiple of the optimal solution.
Referring to fig. 1-2, a data query method based on two-dimensional space provided by an embodiment of the present application is shown;
the method comprises the following steps:
s110, acquiring a plurality of data points in a database, and generating a convex hull according to a preset rule and the data points;
s120, determining a reduction ratio according to a regret index input by a user, and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction ratio;
s130, determining a result set according to the maximum visual area of the data points, wherein the visual area of the result set surrounds the convex hull;
and S140, determining a simplest set comprising the fewest data points according to the result set.
In the embodiment of the application, a convex hull is generated by acquiring a plurality of data points in a database and according to a preset rule and the data points; determining a reduction ratio according to a regret index input by a user, and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction ratio; determining a result set according to a maximum visual area of a plurality of the data points, wherein the visual area of the result set surrounds the convex hull; determining a simplest set comprising the fewest data points from the result set. By pre-constructing the convex hull, the convex hull does not need to be reconstructed when a user calls different k-regret queries on the same data set every time, and the query efficiency is greatly improved. By storing the poles clockwise in a linked list, the visible area of each data point can be quickly calculated.
Next, a two-dimensional space-based data query method in the present exemplary embodiment will be further described.
In step S110, a plurality of data points in the database are obtained, and a convex hull is generated according to a preset rule and the plurality of data points.
In an embodiment of the invention, the specific process of "acquiring a plurality of data points in the database and generating a convex hull according to a preset rule and a plurality of data points" in step S110 can be further described with reference to the following description.
Note that the Convex Hull (Convex Hull) is defined as: a subset S of the planes is said to be "convex" if and only if, for any two points p, S ∈ S, the line segment ps belongs completely to S. (planar convex hull definition). A visual metaphor can be made for the convex hull: and (3) expanding a rubber band, sleeving the expanded rubber band outside the graph, naturally contracting the rubber band, and tightening the rubber band, so that a closed curve formed by the rubber band is a convex hull.
As an example, given a product database, data is preprocessed first, so that huge operations are avoided when a user specifies the repentance index, and query efficiency is affected. The data preprocessing is divided into two steps, namely data point extraction and convex hull construction.
In a two-dimensional query problem, each product in the database is described by two attributes, and therefore, each product can be considered as a data point in two-dimensional Euclidean space, described by XY coordinates. The X coordinate axis corresponds to a numerical value under a first attribute, and the Y coordinate axis corresponds to a numerical value under a second attribute.
Given the data points corresponding to all products, we can construct a convex hull representation of the database. In two-dimensional euclidean space, a convex hull can be thought of as a rubber band that encompasses exactly all points. In the case of an imprecise set of points on a two-dimensional plane, a convex hull is a convex polygon formed by connecting the outermost points, which can encompass all of the points in the set. The calculation of the convex Hull is a classic problem in the calculation of geometry (graphics), and the calculation methods of the two-dimensional convex Hull include a Jarvis step algorithm (Jarvis March), an Incremental algorithm (Incremental Method), a Quick convex Hull algorithm (Quick Hull), a Divide and Conquer algorithm (divider and Conquer), a Graham scanning algorithm, a monotonic Chain algorithm (Monotone Chain), a Kirkpatrick-Seidel algorithm, a chan algorithm and the like, which are not described herein again.
Determining a reduction ratio according to a regret index input by a user, and determining a maximum visible area of a plurality of the data points corresponding to the convex hull according to the reduction ratio, as stated in step S120.
In an embodiment of the present invention, the specific process of "determining a reduction ratio according to the repentance index input by the user" may be further described with reference to the following description.
Determining a reduction ratio according to the regret index as described in the following steps;
narrowing the data points according to the narrowing ratio to generate narrowed data points as described in the following steps;
and reducing the convex hull according to the reduction proportion to generate a reduced convex hull as described in the following steps.
It should be noted that the repentance index is a value describing the satisfaction of the user and the quality of the recommended product. Assuming that the database recommends a subset of products to the user, if the user's score for the most satisfactory product in the subset is x% of his score for the most satisfactory product in the entire database, we say that the user's regret index for this subset is (100-x)%. The lower the regret index, the more satisfied the user is with the recommended product.
As an example, when the user has given the regret index α, and the reduction ratio is determined to be 1- α, we need to scale the pre-constructed convex hull and several of the data points by reduction ratio.
In one implementation, for each data point p used to construct the convex hull, we scale it down to (1- α) p. After scaling the data points, the pre-constructed convex hull is also scaled proportionally. The scaled convex hull may be used to perform a k-regret query of the user. Although we need to scale the convex hull each time we give a regret index α, we do not need to reconstruct the convex hull from the data points, most of the computation of the convex hull is done in the data pre-processing. When the user makes a query, we only need to scale the pre-processed results equally, which is very fast. By pre-constructing the convex body, the convex body does not need to be reconstructed each time a user calls a different k-regret query on the same data set. On the contrary, the pre-constructed result can be utilized, and only rapid data adjustment is carried out during query, so that the query efficiency is greatly improved.
In an embodiment of the present invention, a specific process of "determining a plurality of data points corresponding to the maximum visible area of the convex hull according to the reduction scale" may be further described in conjunction with the following description.
When the shrunken data point is inside the shrunken convex hull, then the maximum viewable area of the shrunken data point corresponding to the shrunken convex hull is zero;
or the like, or a combination thereof,
and when the reduced data points are outside the reduced convex hull, clockwise storing the poles of the reduced convex hull to generate a linked list.
In an embodiment of the present invention, a specific process of "when the reduced data point is outside the reduced convex hull, the pole of the reduced convex hull is stored clockwise, and the linked list is generated" may be further described with reference to the following description.
Storing the poles of the reduced convex hulls in a clockwise direction to generate a linked list;
determining a first pole and a second pole which form the largest included angle between the linked list and the reduced data points;
determining the reduced data point corresponding to the largest visible area of the reduced convex hull according to the first pole and the second pole, as described in the following steps.
It should be noted that the poles correspond to the vertices of the convex hull. Using a strict mathematical definition, poles refer to points that cannot be represented by a Convex Combination (constellation Combination) of other points within the Convex hull. A line segment is the convex combination of all its endpoints, a triangle is the convex combination of its three vertices, and a tetrahedron is the convex combination of its four vertices.
As an example, given a reduced convex hull and reduced data points in the pre-processing, we compute the maximum viewable area for each reduced data point for the reduced convex hull. In the first step of the calculation, we first need to determine whether the reduced data point is inside or outside the reduced convex hull, if inside the reduced convex hull, the visible region of the data point is 0. If the data point is outside the reduced convex hull, we obtain the maximum visual area corresponding to the data point through further calculation. Intuitively, the visible area is the maximum range of the convex hull that can be seen at the location of the data point. As shown in fig. 3, the key to the calculation of the maximum viewable area is to locate two poles s and t on the reduced convex hull. The tangents ds and dt that the reduction data point d and the two poles make up do not intersect the reduction convex hull, and the angle that ds and dt make up is the maximum viewable area where the reduction data point d corresponds to the reduction convex hull. Because the reduced data points are in a two-dimensional space, all poles of the reduced convex hull can form a linked list in a clockwise mode, and in the linked list, effective poles s and t can be quickly positioned through a dichotomy, so that the visible area corresponding to the reduced data points is calculated.
As stated in step S130, a result set is determined according to a maximum visual area of a number of the data points, wherein the visual area of the result set encloses the convex hull.
In an embodiment of the present invention, the specific process of "determining the result set according to the maximum visible area of the data points, wherein the visible area of the result set surrounds the convex hull" in step S130 can be further explained with reference to the following description.
Determining a key data point of the plurality of reduced data points, wherein a viewable area of the set of key data points encompasses the reduced convex hull;
the result set is determined from the set of key data points, as described in the following steps.
In one implementation, assuming there are n products in the database, then each product will have a corresponding data point and visual area after processing. The size of the angle of the visible area, i.e. the importance of the data point, is greater the angle of the visible area, the more critical the data point. Next, we select key data points as much as possible by a greedy algorithm until the corresponding visible region set of the key data points completely surrounds the reduced convex hull, generating a result set. In other words, any one position of the reduced convex hull is visible by one or more selected data points.
It should be noted that the generated result set is greater than or equal to one.
As stated in step S140, a simplest set including the fewest data points is determined according to the result set.
In an embodiment of the present invention, the specific process of "determining the simplest set including the fewest data points according to the result set" in step S140 can be further explained with reference to the following description.
Determining the simplest set of the result sets that contains the fewest key data points, as described in the following steps.
As an example, in step S130, we select a plurality of visual regions that completely surround the reduced convex hull, each visual region corresponding to a data point extracted from the product. Therefore, the products corresponding to the selected visual areas are collected to form a final result set. Since there may be more than one result set in which the visible region may enclose the convex hull, we need to determine that the result set contains the least number of key data points to return to the user. The simplest set can ensure that the repentance index of any preferred user does not exceed alpha, and the number of output products is not higher than the logarithm of the optimal solution.
In an embodiment of the present invention, the method further includes:
and when the user updates the regret index, narrowing the convex hull and the data points according to the updated regret index.
As an example, when the user updates the regret index α, the pre-constructed convex body is reduced again according to the updated regret index, and finally the simplest set is output, without reconstructing the convex body from the data point.
Example 1
Suppose user a wants to select an inexpensive but relatively new used car in the database using a k-regret query. He only needs to set a regret index a, say 10%. This means that he wants to find a car with more than 90 points if the most ideal used car in the database is 100 points. After setting the alpha value, he can start the k-regret query. The database system returns a small amount of used cars, and ensures that the user A has more than 90 points in the cars, namely the regret index does not exceed alpha. If at this point, user A is not satisfied with the returned car, he may adjust his set α value, for example to 5%, and repeat the k-regret query again. At this point, the database system will return more used cars of better quality, so that user a can find more than 95 cents of cars in these cars. It is worth noting that when k-regret query is carried out again, we do not need to calculate from the head, and can use the convex hull obtained in the last query to carry out secondary operation, thereby greatly saving the time of the user in the actual query and improving the query efficiency.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Referring to fig. 4, a two-dimensional space-based data query apparatus for querying a simplest set from a plurality of result sets satisfying a regret index given by a user is illustrated according to an embodiment of the present application;
the method specifically comprises the following steps:
the preprocessing module 410 is configured to obtain a plurality of data points in a database, and generate a convex hull according to a preset rule and the plurality of data points;
a reduction module 420, configured to determine a reduction ratio according to a regret index input by a user, and determine, according to the reduction ratio, a maximum visible area where a plurality of data points correspond to the convex hull;
a calculating module 430, configured to determine a result set according to a maximum visible area of the data points, where the visible area of the result set surrounds the convex hull;
an output module 440, configured to determine a simplest set including the fewest data points according to the result set.
In an embodiment of the present invention, the reducing module 420 includes:
a reduction proportion determining submodule for determining a reduction proportion according to the regret index;
a data point reduction submodule for reducing the data point according to the reduction proportion to generate a reduced data point;
and the convex hull reducing submodule is used for reducing the convex hull according to the reducing proportion to generate a reduced convex hull.
In an embodiment of the present invention, the reducing module 420 further includes:
a linked list generation submodule for storing the poles of the reduced convex hulls in the clockwise direction and generating a linked list;
the pole determining submodule is used for determining a first pole and a second pole which form the largest included angle between the linked list and the reduced data points;
a maximum visible region determining submodule, configured to determine, according to the first pole and the second pole, a maximum visible region where the reduced data point corresponds to the reduced convex hull.
In an embodiment of the present invention, the reducing module 420 further includes:
a data point position determining submodule, configured to determine that a maximum visible area of the reduced data point corresponding to the reduced convex hull is zero when the reduced data point is inside the reduced convex hull;
or the like, or, alternatively,
and when the reduced data points are outside the reduced convex hull, clockwise storing the poles of the reduced convex hull to generate a linked list.
In an embodiment of the present invention, the calculating module 430 includes:
a key data point determination submodule for determining a key data point of the plurality of reduced data points, wherein a viewable area of the set of key data points encompasses the reduced convex hull;
a result set determination submodule for determining the result set in dependence on the set of key data points.
In an embodiment of the present invention, the output module 440 includes:
a simplest set determination submodule for determining the simplest set of the result sets that contains the fewest key data points.
In an embodiment of the present invention, the method further includes:
and the updating module is used for reducing the convex hull and the data points according to the updated regret index when the user updates the regret index.
Referring to fig. 5, a computer device for a two-dimensional space-based data query method according to the present invention is shown, which may specifically include the following:
the computer device 12 described above is in the form of a general purpose computing device, and the components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics port, and a processor or local bus 18 using any of a variety of bus 18 architectures. By way of example, such architectures include, but are not limited to, industry Standard Architecture (ISA) bus 18, micro-channel architecture (MAC) bus 18, enhanced ISA bus 18, audio Video Electronics Standards Association (VESA) local bus 18, and Peripheral Component Interconnect (PCI) bus 18.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (commonly referred to as "hard drives"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, with the program modules 42 configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments of the invention as described.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), with one or more devices that enable an operator to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks (e.g., a Local Area Network (LAN)), a Wide Area Network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As shown, the network adapter 20 communicates with the other modules of the computer device 12 via the bus 18. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, and data backup storage systems 34, etc.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing a two-dimensional space-based data query method provided by the embodiment of the present invention.
That is, the processing unit 16 implements, when executing the program,: generating a convex hull by acquiring a plurality of data points in a database and according to a preset rule and the data points; determining a reduction ratio according to a regret index input by a user, and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction ratio; determining a result set according to a maximum visual area of a plurality of the data points, wherein the visual area of the result set surrounds the convex hull; and determining a simplest set comprising the fewest data points according to the result set. By pre-constructing the convex hull, the convex hull does not need to be reconstructed when a user calls different k-regret queries on the same data set every time, and the query efficiency is greatly improved. By storing the poles clockwise in a linked list manner, the visible area of each data point can be quickly calculated.
In an embodiment of the present invention, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a two-dimensional space-based data query method as provided in all embodiments of the present application:
that is, the program when executed by the processor implements: generating a convex hull by acquiring a plurality of data points in a database and according to a preset rule and the data points; determining a reduction ratio according to a regret index input by a user, and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction ratio; determining a result set according to a maximum visual area of a plurality of the data points, wherein the visual area of the result set surrounds the convex hull; determining a simplest set comprising the fewest data points from the result set. By pre-constructing the convex hull, the convex hull does not need to be reconstructed when a user calls different k-regret queries on the same data set every time, and the query efficiency is greatly improved. By storing the poles clockwise in a linked list manner, the visible area of each data point can be quickly and quickly calculated.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the operator's computer, partly on the operator's computer, as a stand-alone software package, partly on the operator's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the operator's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). The embodiments in the present specification are all described in a progressive manner, and each embodiment focuses on differences from other embodiments, and portions that are the same and similar between the embodiments may be referred to each other.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or terminal apparatus that comprises the element.
The data query method based on the two-dimensional space provided by the application is introduced in detail, and specific examples are applied to explain the principle and the implementation of the application, and the description of the above embodiments is only used to help understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A two-dimensional space-based data query method, for querying a simplest set from a plurality of result sets that satisfy a user-given remorse index, the method comprising:
acquiring a plurality of data points in a database, and generating a convex hull according to a preset rule and the data points;
determining a reduction ratio according to a regret index input by a user, and determining the maximum visible area of a plurality of data points corresponding to the convex hull according to the reduction ratio;
determining a result set according to a maximum visual area of a plurality of the data points, wherein the visual area of the result set surrounds the convex hull;
determining a simplest set comprising the fewest data points from the result set.
2. The method according to claim 1, characterized in that said step of determining the reduction ratio as a function of the regret index input by the user comprises:
determining a reduction proportion according to the regret index;
narrowing the data points according to the narrowing proportion to generate narrowed data points;
and reducing the convex hull according to the reduction proportion to generate a reduced convex hull.
3. The method of claim 2, wherein said step of determining a number of said data points corresponding to a maximum viewable area of said convex hull based on said reduced scale comprises:
storing the poles of the reduced convex hulls in the clockwise direction to generate a linked list;
determining a first pole and a second pole which form the largest included angle between the linked list and the reduced data points;
determining the reduced data point to correspond to the maximum viewable area of the reduced convex hull according to the first pole point and the second pole point.
4. The method of claim 3, wherein prior to the step of storing the poles of the reduced convex hull clockwise to generate the linked list, further comprising:
when the reduced data point is inside the reduced convex hull, then the maximum viewable area of the reduced data point corresponding to the reduced convex hull is zero;
or the like, or, alternatively,
and when the reduced data points are outside the reduced convex hull, clockwise storing the poles of the reduced convex hull to generate a linked list.
5. The method of claim 2, wherein determining a result set based on a maximum viewable area of the plurality of data points, wherein the step of bounding the convex hull by the viewable area of the result set comprises:
determining a key data point of the number of reduced data points, wherein a viewable area of the set of key data points encompasses the reduced convex hull;
determining the result set from the set of key data points.
6. The method of claim 5, wherein said step of determining a simplest set comprising a fewest number of said data points from said result set comprises:
determining the simplest set of the result sets that includes the fewest key data points.
7. The method of claim 1, further comprising:
and when the user updates the regret index, narrowing the convex hull and the data points according to the updated regret index.
8. A data query apparatus based on two-dimensional space, said apparatus being used for querying a simplest set from a plurality of result sets satisfying a regret index given by a user, said apparatus comprising:
the preprocessing module is used for acquiring a plurality of data points in a database and generating a convex hull according to a preset rule and the data points;
the reduction module is used for determining a reduction ratio according to a regret index input by a user and determining the maximum visible area of the data points corresponding to the convex hull according to the reduction ratio;
a calculation module for determining a result set according to a maximum visible area of the plurality of data points, wherein the visible area of the result set encloses the convex hull;
and the output module is used for determining a simplest set containing the fewest data points according to the result set.
9. A computer device comprising a processor, a memory, and a computer program stored on the memory and capable of running on the processor, the computer program, when executed by the processor, implementing the method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202210794577.9A 2022-07-07 2022-07-07 Data query method based on two-dimensional space Pending CN115292362A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210794577.9A CN115292362A (en) 2022-07-07 2022-07-07 Data query method based on two-dimensional space
PCT/CN2022/104838 WO2024007350A1 (en) 2022-07-07 2022-07-11 Data query method based on two-dimensional space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210794577.9A CN115292362A (en) 2022-07-07 2022-07-07 Data query method based on two-dimensional space

Publications (1)

Publication Number Publication Date
CN115292362A true CN115292362A (en) 2022-11-04

Family

ID=83821520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210794577.9A Pending CN115292362A (en) 2022-07-07 2022-07-07 Data query method based on two-dimensional space

Country Status (2)

Country Link
CN (1) CN115292362A (en)
WO (1) WO2024007350A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270233B (en) * 2011-07-29 2013-03-27 中国航天科技集团公司第五研究院第五一三研究所 Searching method for convex hull
US9779137B2 (en) * 2013-07-09 2017-10-03 Logicblox Inc. Salient sampling for query size estimation
US10838951B2 (en) * 2018-04-02 2020-11-17 International Business Machines Corporation Query interpretation disambiguation
CN110887501B (en) * 2019-11-15 2020-10-02 桂林电子科技大学 Traffic navigation method and device for variable destination
CN112162986B (en) * 2020-10-09 2021-08-17 湖南大学 Parallel top-k range skyline query method and system

Also Published As

Publication number Publication date
WO2024007350A1 (en) 2024-01-11

Similar Documents

Publication Publication Date Title
US20200334220A1 (en) System and method for determining exact location results using hash encoding of multi-dimensioned data
CN109766497B (en) Ranking list generation method and device, storage medium and electronic equipment
CN110321958B (en) Training method of neural network model and video similarity determination method
US8749553B1 (en) Systems and methods for accurately plotting mathematical functions
CN110232411B (en) Model distillation implementation method, device, system, computer equipment and storage medium
CN110807041B (en) Index recommendation method and device, electronic equipment and storage medium
CN109325108B (en) Query processing method, device, server and storage medium
EP2306339A1 (en) Algorith and implementation for fast computation of content recommendation
CN114036322A (en) Training method for search system, electronic device, and storage medium
CN107341221B (en) Index structure establishing and associated retrieving method, device, equipment and storage medium
CN113409307A (en) Image denoising method, device and medium based on heterogeneous noise characteristics
CN113314207A (en) Object recommendation method and device, storage medium and electronic equipment
CN115292362A (en) Data query method based on two-dimensional space
CN114547086B (en) Data processing method, device, equipment and computer readable storage medium
CN112883218A (en) Image-text combined representation searching method, system, server and storage medium
CN113076395B (en) Semantic model training and search display method, device, equipment and storage medium
US11704896B2 (en) Method, apparatus, device and storage medium for image processing
CN115034196A (en) Text information matching method and device, electronic equipment and storage medium
CN113656876B (en) Automatic cabinet model generation method, device, medium and electronic equipment
CN115168727B (en) User habit mining method and device and electronic equipment
CN117407793B (en) Parallelization strategy optimization method, system, equipment and medium for large language model
CN113469877B (en) Object display method, scene display method, device and computer readable medium
US11093542B2 (en) Multimedia object search
CN115599852A (en) Webgl visualization method, system, equipment and storage medium for similarity data
CN113779333A (en) Target object query method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination