WO2021081913A1 - Appareil et procédé d'interrogation par vecteur, dispositif électronique et support de stockage - Google Patents

Appareil et procédé d'interrogation par vecteur, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2021081913A1
WO2021081913A1 PCT/CN2019/114795 CN2019114795W WO2021081913A1 WO 2021081913 A1 WO2021081913 A1 WO 2021081913A1 CN 2019114795 W CN2019114795 W CN 2019114795W WO 2021081913 A1 WO2021081913 A1 WO 2021081913A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
sample
query
residual
vectors
Prior art date
Application number
PCT/CN2019/114795
Other languages
English (en)
Chinese (zh)
Inventor
张家兴
Original Assignee
北京欧珀通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京欧珀通信有限公司 filed Critical 北京欧珀通信有限公司
Priority to CN201980099370.6A priority Critical patent/CN114245896A/zh
Priority to PCT/CN2019/114795 priority patent/WO2021081913A1/fr
Publication of WO2021081913A1 publication Critical patent/WO2021081913A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures

Definitions

  • This application relates to the field of data processing technology, and more specifically, to a vector query method, device, electronic equipment, and storage medium.
  • the query content and sample content are represented by vectors, and the query is performed according to the vectors, and the sample content that matches the query content is finally obtained.
  • the content of the user's query becomes more and more complex, and the query takes more time, so the efficiency of the query needs to be improved.
  • this application proposes a vector query method, device, electronic equipment, and storage medium.
  • an embodiment of the present application provides a vector query method to obtain a query vector; according to a pre-established first index, obtain a first cluster center vector whose distance to the query vector satisfies a first set distance condition ,
  • the first index includes a plurality of first clusters obtained by clustering sample vectors and a first cluster center vector corresponding to each first cluster, and each first cluster It includes a plurality of sample vectors; obtains the residual vector between the query vector and the target vector as the query residual vector; obtains the corresponding to each sample vector in the plurality of sample vectors according to a pre-established second index
  • the second index includes the code corresponding to each residual sample vector obtained by product quantization on the sample residual vector corresponding to each sample vector using a product quantization method, and the sample residual vector is the The residual vector between the sample vector and the target vector; according to the query residual vector and the code corresponding to each sample residual vector, the distance to the query vector is obtained from the multiple sample vectors The sample vector that meets the second set
  • an embodiment of the present application provides a vector query device.
  • the device includes: a vector acquisition module, a first determination module, a residual acquisition module, a second determination module, and a vector determination module, wherein the vector acquisition
  • the module is used to obtain a query vector;
  • the first determining module is used to obtain a first cluster center vector whose distance to the query vector satisfies a first set distance condition according to a pre-established first index, as a target vector,
  • the first index includes a plurality of first clusters obtained by clustering sample vectors and a first cluster center vector corresponding to each first cluster, and each first cluster includes a plurality of sample vectors
  • the residual obtaining module is used to obtain the residual vector between the query vector and the target vector as a query residual vector;
  • the second determining module is used to obtain the residual vector according to a pre-established second index
  • the code corresponding to each sample vector in the plurality of sample vectors, and the second index includes the code corresponding to each residual sample vector obtained by
  • an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory and It is configured to be executed by the one or more processors, and the one or more programs are configured to execute the vector query method provided in the above-mentioned first aspect.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores program code, and the program code can be invoked by a processor to execute the vector provided in the first aspect. Query method.
  • the solution provided by this application obtains the query vector, and then obtains the first cluster center vector whose distance to the query vector meets the first set distance according to the pre-established first index, and then obtains the difference between the query vector and the target vector.
  • the residual vector is used as a residual query vector. According to a pre-established second index, the code corresponding to each sample vector in the multiple sample vectors is obtained.
  • the second index includes the sample residual corresponding to each sample vector using the product quantization method
  • the vector is multiplied and quantized to obtain the code corresponding to each residual sample vector, and then according to the query residual vector and the code corresponding to each sample residual vector, the distance to the query vector is obtained from multiple sample vectors to meet the second set distance
  • Conditional sample vectors are used as query results to realize vector retrieval through coarse clustering and product quantization, which reduces the complexity of vector retrieval step by step, so that the speed and accuracy of vector retrieval can be guaranteed.
  • Fig. 1 shows a flowchart of an index construction method according to an embodiment of the present application.
  • Fig. 2 shows a flowchart of step S110 in a method for constructing an index according to an embodiment of the present application.
  • FIG. 3 shows a schematic diagram of the principle of establishing a second index provided by an embodiment of the present application.
  • Fig. 4 shows a flowchart of a vector query method according to an embodiment of the present application.
  • Fig. 5 shows a flowchart of step S220 in the vector query method provided in an embodiment of the present application.
  • FIG. 6 shows a flowchart of step S250 in the vector query method provided in an embodiment of the present application.
  • Fig. 7 shows a flowchart of a vector query method according to another embodiment of the present application.
  • Fig. 8 shows a block diagram of a vector query device according to an embodiment of the present application.
  • Fig. 9 is a block diagram of an electronic device for executing the vector query method according to an embodiment of the present application according to an embodiment of the present application.
  • FIG. 10 is a storage unit for storing or carrying program code for implementing the vector query method according to the embodiment of the present application according to an embodiment of the present application.
  • the current index structure mainly includes: tree index structure, hash index, graph index and vector quantization.
  • tree index structure generally speaking, when the spatial dimension is relatively low, the tree index is funny, but when the vector dimension is relatively high, the performance and accuracy are not ideal.
  • hash indexes although this method can quickly build indexes, it does not perform well in the accuracy of retrieval. For high-dimensional vectors above tens of millions, the accuracy is usually less than 50%, which is difficult to apply to most scenarios.
  • this indexing method can achieve good results in vector similarity calculations of tens of millions of levels, but if the data scale reaches hundreds of millions of levels, the time to build the index will be very long, and the time spent in retrieval is also very long. It cannot meet the needs of online computing, and when adding indexes to subsequently added samples, it will cause a wide range of linkages in the index structure, and performance is difficult to guarantee.
  • vector quantization such as clustering, product quantization, etc.
  • when solving vector similar calculations with hundreds of millions of data volumes relying solely on clustering or product quantization methods. If the accuracy of retrieval is to be ensured, the indexing time will be longer. long.
  • the inventor proposed the vector query method, device, electronic device, and storage medium provided by the embodiments of the application.
  • the closest distance to the query vector is obtained.
  • the sample vector closest to the query vector is obtained under the cluster where the cluster center is located, so that the distance between the query vector and each sample vector is not required to be calculated violently.
  • the query time is reduced, and the method of product quantization is adopted, which can effectively improve the accuracy of vector query.
  • the specific vector query method will be described in detail in the subsequent embodiments.
  • FIG. 1 shows a schematic flowchart of an index construction method provided by an embodiment of the present application.
  • the index construction method is used for clustering according to sample data, and establishing a first index according to the clustering result, and then using the product quantization method to obtain a codebook according to the clustering result, and establishing a second index, the first index And the second index is used in the process of vector query.
  • the index construction method can be applied to electronic devices. The following will take an electronic device as an example to illustrate the specific process of this embodiment. Of course, it is understandable that the electronic device applied in this embodiment may be a server or other devices, which is not limited here.
  • the flow shown in Fig. 1 will be described in detail below, and the index construction method may specifically include the following steps:
  • Step S110 Perform clustering on all sample vectors to obtain multiple first clusters.
  • sample content can be images, videos, audios, documents, web pages, news posts, and other types of content.
  • the specific types of sample content may not be limited.
  • the sample content when the electronic device is used to process and query images, the sample content may be image content.
  • the sample content can be processed to obtain a sample vector corresponding to the sample content, and the sample vector is used to characterize the characteristics of the sample content.
  • the characteristics of the sample content can be extracted according to different types of sample content and a sample vector can be formed.
  • image features such as brightness value, gray value, number of pixels, gray average value, gray median value, etc. can be extracted as elements constituting the sample vector to form the sample vector.
  • features such as tone color, pitch, volume, text content, keywords in the audio content, etc. can be extracted as elements constituting the sample vector, thereby forming the sample vector.
  • features such as word segmentation, keywords, and word frequency can be extracted as elements constituting the sample vector to form the sample vector.
  • the specific method of obtaining the sample vector may not be a limitation.
  • a larger number of features can be correspondingly extracted, thereby forming a high-dimensional sample vector.
  • the specific dimension of the sample vector may not be a limitation. For example, it may be several hundred dimensions or several thousand dimensions, which is not limited here.
  • a sample vector corresponding to each sample content can be obtained.
  • the business scenario is generally to find a few vectors that are similar to the query vector from one billion high-dimensional sample vectors, if only through clustering To solve this problem, the number of clusters may need to be more than 10,000, and the process of clustering convergence will be very slow. You can first perform coarse clustering on all sample vectors, so that after obtaining sample vectors, you can compare all sample vectors Perform clustering.
  • multiple clusters can be generated, that is, multiple categories are generated, and the generated clusters are regarded as the first cluster, and each first cluster corresponds to a cluster center.
  • the cluster center can be understood as the centroid in the cluster, which is at the center of the sample vector distribution under the cluster.
  • Each cluster center can be represented by a vector with the same dimension as the sample vector, which becomes the center vector of the cluster center, and the center vector of the first cluster can be used as the first cluster center vector.
  • all the sample vectors are clustered, and the K-means clustering algorithm can be used to cluster all the vectors.
  • the K-means clustering algorithm can be used to cluster all the vectors.
  • Figure 2 please refer to Figure 2 to cluster all sample vectors to obtain multiple first clusters, including:
  • Step S111 Determine the number of clusters according to the number of all the sample vectors, or according to a set algorithm.
  • the number of clusters can be determined according to the number of sample vectors and the required intensive reading, so as to achieve the number of first clusters obtained after clustering. Is the predetermined number of clusters.
  • the number of clusters can also be determined according to the set algorithm, such as the elbow rule, contour coefficient and other algorithms.
  • the number of clusters can meet a certain relationship with the number of sample vectors, so as to achieve the effect of coarse clustering.
  • Step S112 According to the number of clusters, the K-means clustering algorithm is used to cluster all the sample vectors to obtain a plurality of first clusters.
  • the number of the first clusters is equal to the number of the clusters. The numbers are the same.
  • the electronic device can set and adjust the clustering parameters in the K-means clustering algorithm according to the number of clusters, and then cluster all the sample vectors, so that the first cluster obtained after clustering
  • the number is the number of clusters determined in step S111 to implement coarse clustering of sample vectors.
  • clustering all vectors is not limited, and clustering can also be performed based on other clustering methods, such as a hierarchical clustering algorithm, a density-based clustering algorithm, and so on.
  • Step S120 Obtain the first cluster center vector corresponding to the cluster center in each first cluster.
  • multiple first clusters can be obtained, and the first cluster corresponding to the cluster center in each first cluster in the multiple first clusters
  • the cluster center vector is the first cluster center vector obtained in step S110.
  • Step S130 Obtain the first cluster center vector that is closest to each sample vector.
  • the distance between the sample vector and each first cluster center vector can be calculated.
  • the first cluster center vector closest to each sample vector can be determined according to the calculated distance, that is, the first cluster center vector closest to the sample vector.
  • Cluster center vector The first cluster center vector closest to the sample vector is the first cluster center vector corresponding to the first cluster to which the sample vector belongs.
  • the distance between the sample vector and the first cluster center vector is used to characterize the distance between the two, which can refer to Euclidean distance, Mahalanobis distance, angle cosine distance, etc., which is not limited here.
  • Step S140 Establish the index relationship between each first cluster and the corresponding first cluster center vector, and the index relationship between each sample vector and the corresponding first cluster, to obtain the first index.
  • the first cluster center vectors of are the same, and these sample vectors are corresponding to the first cluster corresponding to the first cluster center vector, that is, the sample vectors under the first cluster are determined. Based on this, the index relationship between each sample vector and the corresponding first cluster can be established, and the inverted index relationship between the first cluster and its corresponding multiple sample vectors can be established, and each first cluster can also be established. The index relationship between a cluster and the corresponding first cluster center vector, thereby obtaining the first index. According to the first index, the first cluster center vector corresponding to each first cluster can be queried, and the first cluster to which each sample vector belongs can be queried, and the sample vector under each first cluster can be queried.
  • Step S150 Obtain a sample residual vector corresponding to each sample vector in all the sample vectors, where the sample residual vector is a residual vector between each sample vector and its corresponding first cluster center vector.
  • the first cluster center vector corresponding to each sample vector in the first index can be used to obtain the relationship between each sample vector and its corresponding first cluster center vector.
  • the sample vector and its corresponding first cluster center vector can be subtracted to obtain a residual vector between the sample vector and its corresponding first cluster center vector.
  • Step S160 Reduce the dimension of each sample residual vector to multiple sub-sample vectors in multiple sub-spaces, and the multiple sub-sample vectors have a one-to-one correspondence with the multiple sub-spaces.
  • product quantization may be performed again to perform actuarial calculation and indexing.
  • the process of performing product quantization may be step S160 to step S190 in the embodiment of this application.
  • the first cluster center vector has the same dimension as the sample vector, so the sample residual vector corresponding to each sample vector It is also the same as the dimension of the sample vector, that is, the sample residual vector is also high-dimensional.
  • the processing of high-dimensional vectors is more complex, so the sample residual vector can be reduced in dimensionality.
  • the sample residual vector can be reduced to multiple sample subvectors in multiple subspaces, and the multiple sample subvectors correspond to multiple subspaces one-to-one, that is, each sample residual vector is in a one-to-one correspondence with multiple subspaces.
  • the sample subvectors in different subspaces can be divided into multiple subspaces that are equally divided to ensure that the number of dimensions of the sample subvectors in different subspaces is the same; it can also not be divided according to the way of equal division. In this case, the samples in different subspaces The number of dimensions of the vector may be different.
  • the sample residual vector is a 30-dimensional vector, which is expressed as (i 1 , i 2 , i 3 ,..., i 28 , i 29 , i 30 ), and the sample residual vector can be divided into 6 subspaces.
  • the corresponding sub-vector in the first subspace can be (i 1 , i 2 , i 3 , i 4 , i 5 ), and the corresponding sub-vector in the second subspace can be (i 6 , i 7 , i 8 , i 9 , i 10 ), the corresponding sub-vector in the third subspace can be (i 11 , i 12 , i 13 , i 14 , i 15 ), and the corresponding sub-vector in the fourth subspace can be (i 16 , i 17 , i 18 , i 19 , i 20 ), the corresponding sub-vector in the fifth subspace can be (i 21 , i 22 , i 23 , i 24 , i 25 ), the corresponding in the sixth subspace
  • the sub-vector can be (i 26 , i 27 , i 28 , i 29 , i 30 ).
  • the above space division and sub-vector division are only examples
  • Step S170 clustering the sub-sample vectors in the same sub-space, obtaining multiple second clusters in each sub-space, and a second cluster center vector corresponding to each second cluster.
  • the subsample vectors in the same subspace can be clustered according to each subspace.
  • the K-means clustering algorithm can be used to cluster sample sub-vectors in the same subspace to obtain clustering results.
  • the clustering results include multiple second clusters, each of which The cluster also corresponds to a cluster center, and the cluster center corresponding to the second cluster serves as the second cluster center.
  • the second cluster center corresponds to a second cluster center Vector
  • the second cluster center vector has the same dimension as the sample sub-vector.
  • the specific clustering method please refer to the clustering method in step S110, which will not be repeated here.
  • each sample residual vector is divided into the same number of sample sub-vectors. After clustering the sample subvectors in each subspace, multiple second clusters and their corresponding second cluster center vectors can be obtained. Among them, for each subspace, the same clustering algorithm can be used to determine the same number of second clusters; for different subspaces, different clustering algorithms can also be used to determine different numbers of clustering results.
  • the sample sub-vectors before clustering the sample sub-vectors in the same subspace, may also be processed to improve accuracy.
  • a reference orthogonal matrix may be used to transform each sample sub-vector before performing clustering.
  • the reference orthogonal matrix can be determined based on the optimal product quantization method. For example, the quantization error function can be minimized to obtain the reference orthogonal matrix. For example, after the minimum value of the quantization error function is solved, the Iterative optimization, and finally get the reference orthogonal matrix.
  • the reference orthogonal matrix is used to transform the sample sub-vectors, so the clustering error can be made smaller and the clustering accuracy can be improved.
  • sample residual vectors of multiple sample vectors corresponding to the first cluster After dimensionality is reduced into multiple sample sub-vectors, the sample sub-vectors in the same subspace are clustered. In this way, when the second index established subsequently is used for vector query, after the first cluster corresponding to the query vector is queried, the sample vector as the query result can be found faster according to the second index.
  • Step S180 Encode each second cluster to obtain a sub-code corresponding to each second cluster, wherein multiple sub-codes corresponding to each sample residual vector constitute a code corresponding to each sample residual vector.
  • the second cluster in each subspace can be coded , So as to obtain the sub-code corresponding to each second cluster, so as to establish an index according to the sub-code corresponding to each second cluster.
  • each subspace can correspond to L second clusters, and the corresponding L second clusters in the first subspace can be ordered Code, obtain sub-codes, and the sub-codes are 1, 2, 3,..., L respectively.
  • Step S190 Establish an index relationship between the sub-code corresponding to each sample residual vector and the second cluster center vector to obtain a second index.
  • each sample subvector of the corresponding multiple sample subvectors can be determined to determine each sample subvector.
  • the second cluster center vector closest to the vector in the corresponding subspace, and the second cluster corresponding to the second cluster center vector, and then the sub-code of the second cluster corresponds to the sample sub-vector to determine
  • the sub-code corresponding to each sample sub-vector of the multiple sample sub-vectors of the sample residual vector, and the index relationship between the sub-code corresponding to each sample residual vector and the second cluster center vector is established to obtain the second index .
  • the sub-code corresponding to each sample sub-vector in the multiple sample sub-vectors of the sample residual vector constitutes the code of the sample residual vector.
  • the dimension of the sample vector is 256 dimensions, and it can be divided into 4 subspaces.
  • the dimensions of the sample subvectors in each subspace It is 64 dimensions.
  • the second cluster in each subspace can be encoded as a 1-byte integer, and in each subspace, after clustering the sample subvectors , Generate 256 second clusters, and the subcodes corresponding to the 256 second clusters can be used as the codebook corresponding to the subspace.
  • each second cluster is encoded as a 1-byte integer
  • the sample residual vector can be quantified when the sample residual vector is approximated by the corresponding second cluster center vector in each subspace.
  • the code is a 4-byte integer number, which is the sub-code of the 4 second clusters.
  • the corresponding second cluster center vector in each subspace refers to the second cluster center vector closest to each sample subvector of the sample residual vector in its corresponding subspace.
  • the sample residual vector A in the first subspace, the subcode corresponding to the second cluster where the nearest second cluster center vector is located is 23, and in the second subspace, the second nearest The subcode corresponding to the second cluster where the cluster center vector is located is 148.
  • the subcode corresponding to the second cluster where the closest second cluster center vector is located is 235.
  • the subcode corresponding to the second cluster where the closest second cluster center vector is located is 230, and the sample residual vector is approximately represented by the corresponding second cluster center vector in each subspace, which can be quantified
  • the code is (23,148,235,230), which is the code corresponding to the sample residual vector, and according to the second cluster center vector corresponding to the code, the index relationship between the multiple sub-codes corresponding to the sample residual vector and the second cluster center vector is established, Obtain the second index.
  • the second cluster center vector corresponding to each sub-code can be queried, and the combination of the four queried second cluster center vectors , As an approximation of the sample residual vector.
  • the index component method provided by the embodiment of the present application obtains multiple first clusters and first cluster center vectors corresponding to the first clusters after coarse clustering of sample vectors, and establishes each cluster according to the coarse clustering results.
  • the index relationship between each first cluster and the corresponding first cluster center vector, and the index relationship between each sample vector and the corresponding first cluster obtain the first index.
  • the sample residual vector between each sample vector and its corresponding first cluster center vector is obtained, and after dimensionality reduction, clustering is performed, and then each second cluster is coded, and finally each The index relationship between the sub-code corresponding to the sample residual vector and the second cluster center vector is used to obtain the second index. Therefore, in the index establishment process, the first is to do coarse clustering to realize the rough division, and then through the product quantification, approximate actuarial index establishment, which greatly reduces the index establishment time.
  • FIG. 4 shows a schematic flowchart of a vector query method provided by an embodiment of the present application.
  • the vector query method is used to obtain the cluster center closest to the query vector through the coarse clustering results of the sample data, and then obtain the distance to the query vector based on the index established by the product quantification under the cluster where the cluster center is located. The most recent sample vector, thereby improving the efficiency of vector query.
  • the vector query method can be applied to the above-mentioned electronic device. The following will elaborate on the process shown in FIG. 4, and the vector query method may specifically include the following steps:
  • Step S210 Obtain a query vector.
  • the query vector may refer to the query vector generated according to the query content required by the user.
  • the query vector may be a query vector generated based on text content, and for example, the query vector may also be a query vector generated based on image content, which is not limited here, and the manner of generating the query vector can be referred to the generation in the foregoing embodiment The method of the sample vector will not be repeated here.
  • Step S220 According to the pre-established first index, obtain a first clustering center vector whose distance from the query vector satisfies a first set distance condition as a target vector, and the first index includes clustering sample vectors The obtained plurality of first clusters and the first cluster center vector corresponding to each first cluster, each of the first clusters includes a plurality of sample vectors.
  • the pre-established first index is the index relationship between each first cluster and the corresponding first cluster center vector, and the index relationship between each sample vector and the corresponding first cluster. Therefore, The first index may include a plurality of first clusters obtained by clustering the sample vectors and a first cluster center vector corresponding to each first cluster, and each first cluster includes a plurality of sample vectors.
  • the method for establishing the first index can refer to the content in the foregoing embodiment, which will not be repeated here.
  • the first index may be stored in the electronic device in advance.
  • the first cluster center vector whose distance from the query vector meets the first set distance condition can be obtained according to the first index, and the obtained The first cluster center vector is used as the target vector.
  • step S220 may include:
  • Step S221 Obtain a first cluster center vector corresponding to each first cluster according to the first index.
  • Step S222 Calculate the distance between the query vector and each first cluster center vector respectively.
  • Step S223 According to the distance between the query vector and each first cluster center vector, obtain a first cluster center vector whose distance to the query vector satisfies a first set distance condition.
  • the first cluster center vector corresponding to each first cluster can be queried, and then the distance between the query vector and each first cluster center vector is calculated separately to obtain the query vector The first cluster center vector whose distance satisfies the first set distance condition.
  • the obtained first cluster center vector is used as the target vector, and the first cluster corresponding to the target vector is the cluster that best matches the query vector.
  • the first set distance condition may include: a first cluster center vector with the smallest distance from the query vector; or a first cluster center with a distance less than a first distance threshold from the query vector vector.
  • the first set distance condition is the first cluster center vector with the smallest distance from the query vector
  • the distance between the query vector and each first cluster center vector can be calculated separately.
  • the multiple distances obtained by calculation are sorted from small to large, and then according to the sorting result, the first cluster center vector corresponding to the minimum distance is determined as the target vector.
  • the first set distance condition is the first clustering center vector whose distance to the query vector is less than the first distance threshold
  • the distance between the query vector and each first clustering center vector can be calculated separately, and then filtering
  • the first cluster center vector corresponding to the distance less than the first distance threshold is taken as the target vector.
  • Step S230 Obtain a residual vector between the query vector and the target vector as a query residual vector.
  • the query vector and the target vector may be subtracted to obtain a residual vector between the query vector and the target vector.
  • Step S240 Obtain the code corresponding to each sample vector in the plurality of sample vectors according to a pre-established second index, where the second index includes using a product quantization method to determine the sample residual vector corresponding to each sample vector The code corresponding to each residual sample vector obtained by performing product quantization, where the sample residual vector is a residual vector between the sample vector and the target vector.
  • the second index may be the index relationship between the sub-code corresponding to each sample residual vector and the second cluster center vector.
  • the second index may include the code corresponding to each residual sample vector obtained by product quantization of the sample residual vector corresponding to each sample vector by using the product quantization method, that is, the code obtained when the second index is established in the foregoing embodiment
  • the code corresponding to each residual sample vector For the method of establishing the second index, please refer to the content of the foregoing embodiment, which will not be repeated here.
  • Step S250 According to the query residual vector and the code corresponding to each sample residual vector, obtain a sample vector whose distance to the query vector satisfies a second set distance condition from the multiple sample vectors, As a result of the query.
  • step S250 may include:
  • Step S251 Obtain the distance between the query vector and each sample vector according to the query residual vector and the code corresponding to each sample residual vector.
  • the code includes a plurality of sub-codes
  • the second index also includes a second cluster center vector corresponding to each sub-code.
  • the sub-sample vectors in the same subspace are clustered to obtain multiple second clusters in each subspace, and the second clustering
  • the class is obtained by encoding, and multiple sub-sample vectors correspond to multiple sub-spaces one to one.
  • step S251 may include:
  • the second clustering center vector corresponding to each sub-code in the multiple sub-codes of each sample residual vector; according to the multiple sub-query vectors and the multiple second clusters corresponding to each sample residual vector The center vector is used to obtain the distance between the query vector and each sample vector.
  • the dimensionality of the query residual vector can be reduced according to the method of establishing the second indexing process in the foregoing embodiment, and the dimensionality reduction method can be the same, that is, the dimensionality is also reduced to multiple subvectors in multiple subspaces.
  • the multiple sub-vectors serve as multiple sub-query vectors, and the multiple sub-query vectors correspond to multiple sub-spaces one-to-one.
  • the code corresponding to each sample residual vector and the second cluster center vector corresponding to each sub-code in the code can be known. Then, the distance between the query vector and each sample vector can be obtained according to multiple sub-query vectors and multiple second cluster center vectors corresponding to each of the sample residual vectors.
  • obtaining the distance between the query vector and each sample vector includes: Calculating the distance between the sub-query vector and each second cluster center vector corresponding to the sample residual vector in the same subspace for any sample residual vector in the plurality of sample residual vectors; For each sample residual vector, according to the corresponding relationship between each sample residual vector and sample vector, sum the distances calculated in each subspace to obtain the difference between the query vector and each sample vector distance.
  • each sample residual vector corresponds to the sample vector
  • the sample residual vector is the residual vector between the sample vector and the first cluster center vector
  • the query residual vector is the query vector and the first cluster center vector.
  • a residual vector between the cluster center vectors Therefore, when calculating the distance between the query residual vector and the sample residual vector, it is calculating the distance between the query vector and the sample vector.
  • (A-B)-(C-B) means A-C.
  • Step S252 According to the distance between the query vector and each sample vector, obtain a sample vector whose distance to the query vector satisfies a second set distance condition from the multiple sample vectors as a query result.
  • the second set distance condition includes: a sample vector with the smallest distance from the query vector; or a sample vector with a distance less than a second distance threshold from the query vector.
  • the vector query method obtained by the embodiment of the present application obtains the query vector, and then obtains the first cluster center vector whose distance to the query vector meets the first set distance according to the first index established in advance, and then obtains the query vector and the target
  • the residual vector between the vectors is used as the residual query vector.
  • the code corresponding to each sample vector in the multiple sample vectors is obtained.
  • the second index includes using the product quantization method to correspond to each sample vector
  • the code corresponding to each residual sample vector obtained by product quantization of the sample residual vector of the sample, and then according to the query residual vector and the code corresponding to each sample residual vector, obtain the distance from the query vector from the multiple sample vectors to satisfy the first Second, the sample vector with the distance condition is set as the query result, so that vector retrieval is realized through coarse clustering and product quantization, which reduces the complexity of vector retrieval step by step, so that the speed and accuracy of vector retrieval can be guaranteed.
  • FIG. 7 shows a schematic flowchart of a vector query method provided by another embodiment of the present application.
  • the vector query method can be applied to the above-mentioned electronic devices.
  • the flow shown in FIG. 7 will be described in detail below, and the vector query method may specifically include the following steps:
  • Step S310 Obtain a query vector.
  • step S310 may include: obtaining a service query request; judging whether the service query request carries a vector; if it carries a vector, use the vector as a query vector; if it does not carry a vector, then generate The query vector corresponding to the service query request.
  • the electronic device can parse the parameters in the service query request and determine whether the parameters carry a vector. If it carries a vector, it can directly use the carried vector as the query vector. If the vector is not carried, the query vector can be generated in the manner in the foregoing embodiment.
  • Step S320 Determine whether there is a historical query result corresponding to the query vector.
  • the electronic device since the electronic device serves the queries of different users, and the same user may query the same content multiple times, the electronic device may store past historical query results corresponding to the query vector. After obtaining the query vector, it can be judged whether the historical query result corresponding to the query vector is stored locally, so as to determine whether to execute the query process according to the query result.
  • Step S330 If there is a historical query result corresponding to the query vector, use the sample vector in the historical query result as the query result.
  • the historical query result corresponding to the query vector is stored locally, the historical query result can be directly used as the query result corresponding to this query vector, thereby saving the time spent in the query process.
  • Step S340 If there is no historical query result corresponding to the query vector, obtain a first cluster center vector whose distance to the query vector satisfies the first set distance condition according to the pre-established first index, as the target Vector, the first index includes a plurality of first clusters obtained by clustering a sample vector and a first cluster center vector corresponding to each first cluster, and each first cluster includes a plurality of Sample vector.
  • step S340 to step S370 are performed.
  • Step S350 Obtain a residual vector between the query vector and the target vector as a query residual vector.
  • Step S360 Obtain the code corresponding to each sample vector in the plurality of sample vectors according to a pre-established second index, where the second index includes using a product quantization method to determine the sample residual vector corresponding to each sample vector The code corresponding to each residual sample vector obtained by performing product quantization, where the sample residual vector is a residual vector between the sample vector and the target vector.
  • Step S370 According to the query residual vector and the code corresponding to each sample residual vector, obtain a sample vector whose distance to the query vector satisfies a second set distance condition from the multiple sample vectors, As a result of the query.
  • steps S340 to S370 can refer to the content of the foregoing embodiment, which will not be repeated here.
  • the above-mentioned multiple sample vectors are stored in the first database, that is, the query results obtained in the above steps are the query results obtained based on the first database.
  • the sample vectors, the first index and the second index in the second database and the first database may be different.
  • the vector query method may also include:
  • the first index and the second index can be established based on the sample vector in the second database, and then the vector query can be performed in the manner of the above step S340 to step S370 to obtain the second query result. After that, the first query result and the second query result are combined to obtain the third query result, and the third query result is used as the query result of this vector query, so that the vector query is more accurate.
  • merging may refer to taking both the first query result and the second query result as the final query result.
  • the database may have a large database and a small database.
  • the large database can refer to a database with a large amount of sample data, which is mainly used to store historical sample vectors
  • the small database can refer to a database with relatively few sample data.
  • the vector query method obtained by the embodiments of the present application obtains a query vector and then determines whether there is a historical query result corresponding to the query vector. If there is a historical query result corresponding to the query vector, it can be directly used as the query result, thereby reducing the processing amount.
  • the first cluster center vector whose distance to the query vector meets the first set distance is obtained according to the pre-built first index, and then the difference between the query vector and the target vector is obtained.
  • the residual vector is used as a residual query vector. According to a pre-established second index, the code corresponding to each sample vector in the multiple sample vectors is obtained.
  • the second index includes the sample residual corresponding to each sample vector using the product quantization method
  • the vector is multiplied and quantized to obtain the code corresponding to each residual sample vector, and then according to the query residual vector and the code corresponding to each sample residual vector, the distance to the query vector is obtained from multiple sample vectors to meet the second set distance
  • Conditional sample vectors are used as query results to realize vector retrieval through coarse clustering and product quantization, which reduces the complexity of vector retrieval step by step, so that the speed and accuracy of vector retrieval can be guaranteed.
  • FIG. 8 shows a structural block diagram of a vector query device 400 provided by an embodiment of the present application.
  • the vector query device 400 can be applied to the above-mentioned electronic equipment.
  • the vector query device 400 includes: a vector acquiring module 410, a first determining module 420, a residual acquiring module 430, a second determining module 440, and a vector determining module 450.
  • the vector obtaining module 410 is configured to obtain a query vector
  • the first determining module 420 is configured to obtain a first cluster whose distance to the query vector satisfies a first set distance condition according to a pre-established first index.
  • the cluster center vector is used as a target vector.
  • the first index includes a plurality of first clusters obtained by clustering the sample vectors and a first cluster center vector corresponding to each first cluster.
  • the clustering includes multiple sample vectors;
  • the residual acquisition module 430 is configured to acquire the residual vector between the query vector and the target vector as a query residual vector;
  • the second determination module 440 is configured to Obtain the code corresponding to each sample vector in the plurality of sample vectors according to a pre-established second index, where the second index includes using a product quantization method to perform product quantization on the sample residual vector corresponding to each sample vector
  • the code corresponding to each residual sample vector obtained, where the sample residual vector is the residual vector between the sample vector and the target vector;
  • the vector determining module 450 is configured to perform according to the query residual vector And the code corresponding to the residual vector of each sample, obtaining a sample vector whose distance from the query vector satisfies a second set distance condition from the multiple sample vectors as a query result.
  • the vector determination module 450 may include a distance calculation unit and a vector filtering unit.
  • the distance calculation unit is used to obtain the distance between the query vector and each sample vector according to the query residual vector and the code corresponding to each sample residual vector;
  • the vector filtering unit is used to obtain the distance between the query vector and each sample vector according to the The distance between the query vector and each sample vector, and the sample vector whose distance to the query vector satisfies the second set distance condition is obtained from the plurality of sample vectors as the query result.
  • the code includes a plurality of sub-codes
  • the second index also includes a second cluster center vector corresponding to each sub-code
  • the sub-code is to reduce the dimensionality of each sample residual vector into multiple sub-codes. After multiple subsample vectors in the space, cluster the subsample vectors in the same subspace to obtain multiple second clusters in each subspace, and encode the second clusters to obtain the multiple The sub-sample vectors correspond to the multiple sub-spaces one-to-one.
  • the distance calculation unit may be specifically configured to reduce the dimension of the query residual vector into multiple sub-vectors in multiple subspaces as multiple sub-query vectors, and the multiple sub-query vectors are one-to-one with the multiple sub-spaces.
  • the second index obtains the second cluster center vector corresponding to each sub-code in the multiple sub-codes of the residual vector of each sample; according to the multiple sub-query vectors, and each sample
  • the multiple second cluster center vectors corresponding to the residual vectors are used to obtain the distance between the query vector and each sample vector.
  • the distance calculation unit obtains the distance between the query vector and each sample vector according to the plurality of sub-query vectors and the plurality of second cluster center vectors corresponding to the residual vectors of each sample.
  • the distance of includes: for any sample residual vector in the plurality of sample residual vectors, respectively calculating each second cluster center vector corresponding to the sub-query vector and the sample residual vector in the same subspace For each sample residual vector, according to the corresponding relationship between each sample residual vector and sample vector, sum the distances calculated in each subspace to obtain the query vector and each Distance between sample vectors
  • the vector query device 400 may further include a first index building module.
  • the first index building module can be used to: cluster all sample vectors to obtain multiple first clusters; obtain the first cluster center vector corresponding to the cluster center in each first cluster; The first cluster center vector with the closest distance to the sample vector; establish the index relationship between each first cluster and the corresponding first cluster center vector, and the index relationship between each sample vector and the corresponding first cluster, to obtain The first index.
  • the first index establishment module clusters all the sample vectors to obtain multiple first clusters, including: determining the number of clusters according to the number of all the sample vectors or according to a set algorithm; According to the number of clusters, a K-means clustering algorithm is used to cluster all the sample vectors to obtain a plurality of first clusters, and the number of the first clusters is the same as the number of clusters.
  • the vector query device 400 may also include a second index building module.
  • the second index establishment module may be used to obtain a sample residual vector corresponding to each sample vector in all the sample vectors, where the sample residual vector is the difference between each sample vector and its corresponding first cluster center vector Residual vector; the dimension of each sample residual vector is reduced to multiple sub-sample vectors in multiple sub-spaces, and the multiple sub-sample vectors correspond to the multiple sub-spaces one-to-one; the sub-sample vectors in the same sub-space are Clustering, obtaining multiple second clusters in each subspace, and the second cluster center vector corresponding to each second cluster; encoding each second cluster to obtain each second cluster corresponding
  • the sub-code of each sample residual vector constitutes the code corresponding to each sample residual vector; the index relationship between the sub-code corresponding to each sample residual vector and the second cluster center vector is established, Obtain the second index.
  • the first set distance condition includes: a first cluster center vector with the smallest distance from the query vector; or a first cluster with a distance less than a first distance threshold from the query vector Center vector.
  • the second set distance condition includes: a sample vector with the smallest distance from the query vector; or a sample vector with a distance less than a second distance threshold from the query vector.
  • the vector query device 400 may further include a cache query module, which is configured to obtain the distance from the query vector that satisfies the first set distance condition according to the first index established in advance.
  • the first clustering center vector is used as the target vector to determine whether there is a historical query result corresponding to the query vector; if there is no historical query result corresponding to the query vector, the first determining module is based on the pre-established first An index to obtain a first cluster center vector whose distance from the query vector satisfies a first set distance condition, as a target vector.
  • the vector query device 400 may further include a result determination module.
  • the result determination module is configured to, if there is a historical query result corresponding to the query vector, use the sample vector in the historical query result as the query result.
  • the plurality of sample vectors are stored in a first database.
  • the vector query device 400 may further include: a result identification module, a result query module, and a result merging module.
  • the result identification module is used to use the query result as the first query result; the result query module is used to obtain the second query result according to the first index and the second index established by the sample vector in the second database; the result merging module is used The first query result and the second query result are combined to obtain a third query result as the query result corresponding to the query vector.
  • the vector obtaining module 410 may include: a request obtaining unit, configured to obtain a service query request; a vector determining unit, configured to determine whether the service query request carries a vector; and a first execution unit, configured to: If a vector is carried, the vector is used as a query vector; the second execution unit is configured to generate a query vector corresponding to the service query request if the vector is not carried.
  • the residual obtaining module 430 may be specifically configured to: subtract the query vector and the target vector to obtain a residual vector between the query vector and the target vector.
  • the vector query device 400 may further include an index update module.
  • the index update module is configured to update the first index and the second index according to the newly acquired sample vector every preset time interval.
  • the coupling between the modules may be electrical, mechanical or other forms of coupling.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules.
  • the electronic device 100 may be an electronic device capable of running an application program, such as a server.
  • the electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs.
  • One or more application programs may be stored in the memory 120 and configured to be Or multiple processors 110 execute, and one or more programs are configured to execute the method described in the foregoing method embodiment.
  • the processor 110 may include one or more processing cores.
  • the processor 110 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and calling data stored in the memory 120.
  • Various functions and processing data of the electronic device 100 may use at least one of digital signal processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 110 may be integrated with one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like.
  • the CPU mainly processes the operating system, user interface, and application programs; the GPU is used for rendering and drawing of display content; the modem is used for processing wireless communication. It can be understood that the above-mentioned modem may not be integrated into the processor 110, but may be implemented by a communication chip alone.
  • the memory 120 may include random access memory (RAM) or read-only memory (Read-Only Memory).
  • the memory 120 may be used to store instructions, programs, codes, code sets or instruction sets.
  • the memory 120 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing the operating system and instructions for implementing at least one function (such as touch function, sound playback function, image playback function, etc.) , Instructions used to implement the following various method embodiments, etc.
  • the data storage area can also store data (such as phone book, audio and video data, chat record data) created by the terminal 100 during use.
  • FIG. 10 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the computer-readable medium 800 stores program code, and the program code can be invoked by a processor to execute the method described in the foregoing method embodiment.
  • the computer-readable storage medium 800 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium.
  • the computer-readable storage medium 800 has storage space for the program code 810 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products.
  • the program code 810 may be compressed in a suitable form, for example.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé et un appareil d'interrogation par vecteur, un dispositif électronique et un support de stockage. Le procédé d'interrogation par vecteur consiste : à acquérir un vecteur d'interrogation ; à acquérir, selon un premier indice préétabli, un premier vecteur de centre de groupe, dont la distance par rapport au vecteur d'interrogation vérifie une première condition de distance définie, sous forme de vecteur cible ; à acquérir un vecteur résiduel entre le vecteur d'interrogation et le vecteur cible, sous forme de vecteur résiduel d'interrogation ; à acquérir, selon un second indice préétabli, le code correspondant à chacun de multiples vecteurs d'échantillon, le second indice comprenant le code correspondant à chaque vecteur résiduel d'échantillon obtenu par réalisation d'une quantification de produit sur le vecteur résiduel d'échantillon correspondant à chaque vecteur d'échantillon à l'aide d'un procédé de quantification de produit ; et à acquérir, selon le vecteur résiduel d'interrogation et le code correspondant à chaque vecteur résiduel d'échantillon, à partir des multiples vecteurs d'échantillon, un vecteur d'échantillon, dont la distance par rapport au vecteur d'interrogation vérifie une seconde condition définie de distance, sous forme de résultat d'interrogation. Le présent procédé peut accroître la vitesse d'interrogation par vecteur.
PCT/CN2019/114795 2019-10-31 2019-10-31 Appareil et procédé d'interrogation par vecteur, dispositif électronique et support de stockage WO2021081913A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980099370.6A CN114245896A (zh) 2019-10-31 2019-10-31 向量查询方法、装置、电子设备及存储介质
PCT/CN2019/114795 WO2021081913A1 (fr) 2019-10-31 2019-10-31 Appareil et procédé d'interrogation par vecteur, dispositif électronique et support de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/114795 WO2021081913A1 (fr) 2019-10-31 2019-10-31 Appareil et procédé d'interrogation par vecteur, dispositif électronique et support de stockage

Publications (1)

Publication Number Publication Date
WO2021081913A1 true WO2021081913A1 (fr) 2021-05-06

Family

ID=75714814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114795 WO2021081913A1 (fr) 2019-10-31 2019-10-31 Appareil et procédé d'interrogation par vecteur, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN114245896A (fr)
WO (1) WO2021081913A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626471A (zh) * 2021-08-05 2021-11-09 北京达佳互联信息技术有限公司 数据检索方法、装置、电子设备及存储介质
CN115169489A (zh) * 2022-07-25 2022-10-11 北京百度网讯科技有限公司 数据检索方法、装置、设备以及存储介质
CN116010669A (zh) * 2023-01-18 2023-04-25 深存科技(无锡)有限公司 向量库重训练的触发方法、装置、检索服务器及存储介质
WO2023108995A1 (fr) * 2021-12-15 2023-06-22 平安科技(深圳)有限公司 Procédé et appareil de calcul de similarité de vecteur, dispositif, et support de stockage
CN116541420A (zh) * 2023-07-07 2023-08-04 上海爱可生信息技术股份有限公司 向量数据的查询方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194737B (zh) * 2023-09-14 2024-06-07 上海交通大学 基于距离阈值的近似近邻搜索方法、***、介质及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205331A1 (en) * 2017-01-20 2019-07-04 Rakuten, Inc. Image search system, image search method, and program
CN110134804A (zh) * 2019-05-20 2019-08-16 北京达佳互联信息技术有限公司 图像检索方法、装置及存储介质
CN110168525A (zh) * 2016-10-11 2019-08-23 谷歌有限责任公司 快速数据库搜索***和方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110168525A (zh) * 2016-10-11 2019-08-23 谷歌有限责任公司 快速数据库搜索***和方法
US20190205331A1 (en) * 2017-01-20 2019-07-04 Rakuten, Inc. Image search system, image search method, and program
CN110134804A (zh) * 2019-05-20 2019-08-16 北京达佳互联信息技术有限公司 图像检索方法、装置及存储介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626471A (zh) * 2021-08-05 2021-11-09 北京达佳互联信息技术有限公司 数据检索方法、装置、电子设备及存储介质
CN113626471B (zh) * 2021-08-05 2024-02-23 北京达佳互联信息技术有限公司 数据检索方法、装置、电子设备及存储介质
WO2023108995A1 (fr) * 2021-12-15 2023-06-22 平安科技(深圳)有限公司 Procédé et appareil de calcul de similarité de vecteur, dispositif, et support de stockage
CN115169489A (zh) * 2022-07-25 2022-10-11 北京百度网讯科技有限公司 数据检索方法、装置、设备以及存储介质
CN116010669A (zh) * 2023-01-18 2023-04-25 深存科技(无锡)有限公司 向量库重训练的触发方法、装置、检索服务器及存储介质
CN116010669B (zh) * 2023-01-18 2023-12-08 深存科技(无锡)有限公司 向量库重训练的触发方法、装置、检索服务器及存储介质
CN116541420A (zh) * 2023-07-07 2023-08-04 上海爱可生信息技术股份有限公司 向量数据的查询方法
CN116541420B (zh) * 2023-07-07 2023-09-15 上海爱可生信息技术股份有限公司 向量数据的查询方法

Also Published As

Publication number Publication date
CN114245896A (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2021081913A1 (fr) Appareil et procédé d'interrogation par vecteur, dispositif électronique et support de stockage
Wu et al. Multiscale quantization for fast similarity search
KR101565265B1 (ko) 피쳐 위치 정보의 코딩
WO2023051783A1 (fr) Procédé de codage, procédé de décodage, appareil, dispositif et support de stockage lisible
US20020039446A1 (en) Pattern recognition based on piecewise linear probability density function
WO2023019933A1 (fr) Procédé et appareil de construction de base de données de recherche, et dispositif et support de stockage
WO2019226429A1 (fr) Compression de données par codage entropique local
US11120214B2 (en) Corpus generating method and apparatus, and human-machine interaction processing method and apparatus
CN112347246B (zh) 一种基于谱分解的自适应文档聚类方法及***
US8768075B2 (en) Method for coding signals with universal quantized embeddings
US11531695B2 (en) Multiscale quantization for fast similarity search
WO2019085765A1 (fr) Récupération d'image
Yang et al. Mean-removed product quantization for large-scale image retrieval
Yu et al. Bilinear optimized product quantization for scalable visual content analysis
CN111767421A (zh) 用于检索图像方法、装置、电子设备和计算机可读介质
Wang Neural Network‐Based Dynamic Segmentation and Weighted Integrated Matching of Cross‐Media Piano Performance Audio Recognition and Retrieval Algorithm
US20230086264A1 (en) Decoding method, encoding method, decoder, and encoder based on point cloud attribute prediction
CN115129949A (zh) 向量范围检索的方法、装置、设备、介质及程序产品
Amara et al. Nearest neighbor search with compact codes: A decoder perspective
Kan et al. A supervised learning to index model for approximate nearest neighbor image retrieval
CN111858899B (zh) 语句处理方法、装置、***和介质
CN113901278A (zh) 一种基于全局多探测和适应性终止的数据搜索方法和装置
CN114266249A (zh) 一种基于birch聚类的海量文本聚类方法
CN113220840A (zh) 文本处理方法、装置、设备以及存储介质
Chen et al. Neighborhood-exact nearest neighbor search for face retrieval

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950674

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19950674

Country of ref document: EP

Kind code of ref document: A1