CN115509734A

CN115509734A - Data processing method, system and related equipment

Info

Publication number: CN115509734A
Application number: CN202111112579.7A
Authority: CN
Inventors: 胡天驰; 沈胜宇; 黄江乐
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-06-23
Filing date: 2021-09-18
Publication date: 2022-12-23
Also published as: WO2022268089A1

Abstract

A data processing method, a system and related equipment are applied to a recommendation system comprising a memory node and a computing node, wherein when the computing node needs to read data from the memory node for model training or reasoning, the computing node sends different acquisition requests to the memory node according to service requirements, and the acquisition requests comprise command identifiers, index information of a plurality of target vectors and the like; and the memory node acquires a target vector from the embedded table of the memory according to the index information, processes the target vector through the near memory accelerator according to the operation process corresponding to the command identifier to obtain target data and returns the target data to the computing node. By processing the data in the memory node, the data volume transmitted from the memory node to the computing node is reduced, the time delay for the computing node to acquire the data is reduced, the data volume required to be processed by the computing node is reduced, the resource occupation of the computing node is reduced, and the data processing efficiency of the computing node is improved.

Description

Data processing method, system and related equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, system, and related device.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.

The system applying the recommendation and search method is called a recommendation system, the recommendation system is an important scheme for solving the information overload problem for the user by adopting a deep learning technology, for example, an e-commerce platform can recommend commodities which may be interested to the user according to information such as historical purchase records, historical browsing records and the like of the user. The recommendation system mainly comprises two stages of recalling and sequencing when recommending. In the recalling stage, a part of articles which are potentially interesting to the user are quickly found out from massive articles (commodities, news and videos) in the material library according to the characteristics of the user and are used as a candidate article set; and then inputting the candidate item set into a sorting stage, integrating more characteristics into the sorting stage, and performing accurate recommendation by using a complex model.

In both the recall stage and the sorting stage, a large amount of data needs to be transmitted from the memory node to the computing node, that is, the communication data volume between the memory node and the computing node is huge, which becomes a great factor affecting the performance of the recommendation system.

Disclosure of Invention

The application discloses a data processing method, a data processing system and related equipment, which can reduce the data volume transmitted between a memory node and a computing node of a recommendation system and improve the data processing efficiency of the recommendation system.

In a first aspect, the present application provides a data processing method applied to a recommendation system including one or more compute nodes and one or more memory nodes, where the memory nodes include a near memory accelerator and a storage, and the storage stores an embedded table, the method includes: the memory accelerator receives an acquisition request sent by a computing node, wherein the acquisition request comprises a command identifier and index information of a plurality of target vectors in an embedded table, and the command identifier indicates a processing flow of the memory node for processing the plurality of target vectors; the near memory accelerator acquires a plurality of target vectors from the embedded table of the memory according to the index information; and processing the target vectors according to the processing flow indicated by the command identifier to obtain target data, and sending the target data to the computing node.

Because a computing node of the recommendation system needs to acquire a large amount of material data from a memory node in a recall process, a large amount of historical data of a user needs to be acquired from a user database in a sorting stage, and then the acquired large amount of data is processed in the computing node, the bandwidth between the computing node and the memory node is limited, and the time delay of data transmission from the memory node to the computing node is long. In the application, the computing node instructs a near memory accelerator in the memory node to process data required by the computing node first through an instruction, so that the data are filtered. For example, the above-mentioned recall process is implemented by a near-memory accelerator, vectors of a large number of commodities acquired from a material library are filtered, and filtered data are returned to a computing node, so that the data volume sent from the memory node to the computing node is reduced, the time delay from the transmission of the data from the memory node to the computing node is reduced, meanwhile, the data volume to be processed by the computing node can be reduced, the resource occupation of the computing node is reduced, and the data processing efficiency of the computing node is improved.

It should be understood that the above-mentioned index information may be table information, including table identifications of one or more embedded tables, for indicating the embedded table where the target vector is located; the table information and the feature information may be table information and feature information, and the feature information may include sequence numbers of a plurality of target vectors to be processed. When the index information is table information, the target vector is a vector in an embedded table indicated by one or more table identifications in the table information; when the index information is table information and feature information, the target vector is a vector indicated by the feature information in the embedded table indicated by the table identifier in the table information.

In a possible implementation manner, the obtaining request further includes a target feature vector and a similarity operator, and the command identifier is a first type of command identifier; the processing of the near memory accelerator on the multiple target vectors according to the processing flow indicated by the command identifier to obtain target data includes: the near memory accelerator determines the similarity between each target vector and the target feature vector in the multiple target vectors based on the similarity operator; the near memory accelerator determines a preset number of target vectors according to the similarity between each target vector and the target feature vector, and determines the target vectors of the preset data volume or the index information of the target vectors of the preset data volume as target data.

By the aid of the target characteristic vectors and the similarity operators carried in the acquisition requests, the target vectors are screened according to the similarity between each target vector and the target characteristic vectors, the target vectors with low similarity with the target characteristic vectors are filtered, data volume transmitted from the memory nodes to the computing nodes is reduced, and time delay of the computing nodes for acquiring data from the memory nodes is reduced. For example, in a recall stage of a recommendation system of an e-commerce platform, a computing node needs to obtain a vector or an identifier corresponding to a commodity which a user may be interested in from a memory node, and the memory node can filter the commodity according to the method, so that the data volume transmitted from the memory node to the computing node is reduced.

In a possible implementation manner, the obtaining request further includes a target feature vector and a similarity operator, the vectors in the memory node belong to multiple categories, a category vector representing the category corresponds to a vector of each category, and the command identifier is a second type of command identifier;

the near memory accelerator processes the target vectors according to the processing flow indicated by the command identifier to obtain target data, and the method comprises the following steps:

the near memory accelerator determines the similarity between each category vector in the category vectors and the target characteristic vector according to the category vectors of the categories corresponding to the target vectors; the near memory accelerator determines the weight corresponding to each category according to the similarity between each category vector and the target feature vector;

the near memory accelerator acquires vectors of corresponding quantity from the vectors of each category according to the weight corresponding to each category to obtain a first vector set; the near memory accelerator determines a preset number of target vectors according to the similarity between each target vector in the first vector set and the target feature vector, and determines the target vectors of a preset data volume or index information of the target vectors of the preset data volume as target data.

When the vectors in the memory nodes are classified according to the types of the data represented by the vectors, and each category has a category vector representing the category, the degree of correlation between the vector of each category and the target feature vector can be determined according to the similarity between each category vector and the target feature vector, and the higher the similarity is, the higher the correlation between the data representing the category and the target feature vector is, so that more vectors can be acquired in the category.

And when the vector in the memory node is not classified and each vector does not correspond to a category vector representing a category, calculating the type of the command identifier in the acquisition request sent by the node. When the vectors in the memory node are classified, and each vector has a category vector corresponding to a category, the command identifier in the acquisition request sent by the computing node may be of a second type, and when the command identifier is of the second type, the memory node needs to calculate similarity between each category vector in the plurality of category vectors and the target feature vector, determine a weight corresponding to each category, and then determine a preset number of target vectors from the vectors of each category. It should be understood that when the vectors in the memory node are classified, and each vector has a corresponding category vector representing a category, the command identifier in the obtaining request sent by the computing node may also be a first type of command identifier.

In a possible implementation manner, the obtaining request further includes a target feature vector, a similarity operator, and an aggregation operator, and the command identifier is an identifier of a third type;

the near memory accelerator processes the target vectors according to the processing flow indicated by the command identifier to obtain target data, and the method comprises the following steps: the near memory accelerator determines the similarity between each target vector and the target feature vector in the multiple target vectors according to the similarity operator; the near memory accelerator determines a preset number of target vectors according to the similarity between each target vector and the target feature vector; and the near memory accelerator executes the aggregation operation on the preset number of target vectors according to the aggregation operator to obtain the target data.

After the computing node obtains the multiple target vectors from the memory node, when aggregation operation needs to be performed on the multiple target vectors, the multiple target vectors can be aggregated in the memory node, so that the data volume transmitted from the memory node to the computing node is reduced. For example, after the recommendation system returns a plurality of candidate recommended commodities in the recall stage, the plurality of candidate recommended commodities need to be further screened in the inference stage, and when the attention degree of a user to one commodity is measured, the target feature vector is a vector corresponding to the commodity; the computing node needs to obtain multiple pieces of user history data related to the commodity from the memory node according to the vector corresponding to the commodity, such as time for the user to browse the commodity, the quantity of the purchased commodity, and the like. The memory node can screen out multiple pieces of historical data with high similarity according to the target feature vector direction, and then aggregate the multiple pieces of historical data with high similarity to obtain target data.

In a possible implementation manner, the obtaining request further includes a weight array; the near memory accelerator performs aggregation operation on a preset number of target vectors according to the aggregation operator to obtain target data, and the method comprises the following steps: and the near memory accelerator executes the aggregation operation on the preset number of target vectors according to the aggregation operator and the weight array to obtain the target data.

In a possible implementation manner, the obtaining request further includes an aggregation operator, and the command identifier is an identifier of a fourth type; the method for processing the multiple target vectors by the near memory accelerator according to the processing flow indicated by the command identifier to obtain target data comprises the following steps: and the near memory accelerator executes the aggregation operation on the plurality of target vectors according to the aggregation operator to obtain the target data.

After the computing node obtains the plurality of target vectors from the memory node, when aggregation operation needs to be performed on the plurality of target vectors, the aggregation operation can be performed on the plurality of target vectors in the memory node, so that the data volume transmitted from the memory node to the computing node is reduced. For example, after the recommendation system returns a plurality of candidate recommended commodities in the recall stage, the plurality of candidate recommended commodities need to be further screened in the inference stage, and when the attention degree of a user to one commodity is measured, the target feature vector is a vector corresponding to the commodity; the computing node needs to obtain multiple pieces of user history data related to the commodity from the memory node according to the vector corresponding to the commodity, such as time for the user to browse the commodity, the quantity of the purchased commodity, and the like. And then, carrying out aggregation operation on the plurality of pieces of user historical data to obtain target data.

In a possible implementation manner, the obtaining request further includes a weight array; the method for processing the multiple target vectors by the near memory accelerator according to the processing flow indicated by the command identifier to obtain target data comprises the following steps: and the near memory accelerator executes aggregation operation on the multiple target vectors based on the aggregation operator and the weight array to obtain target data.

The third type of command identifier is a command identifier in an acquisition request sent by the compute node when the compute node needs a memory node to filter a plurality of target vectors according to similarity and then executes an aggregation operation. The command identifier of the fourth type is a command identifier in an acquisition request sent by the computing node when the computing node does not need the memory node to filter the multiple target vectors according to the similarity and directly perform the aggregation operation on the multiple target vectors.

In a second aspect, the present application provides a data processing method applied to a recommendation system including one or more compute nodes and one or more memory nodes, where the memory nodes include a near memory accelerator and a storage, and the storage stores an embedded table, including: the method comprises the steps that a computing node generates an acquisition request according to service requirements, wherein the acquisition request comprises a command identifier and index information of a plurality of target vectors in an embedded table of a memory node, the command identifier indicates a processing flow of a near memory accelerator of the memory node for processing the plurality of target vectors, and the type of the command identifier is associated with the service requirements of the computing node; and the computing node sends the acquisition request to the memory node so that the memory node acquires a plurality of target vectors from the memory according to the index information and processes the target vectors according to the processing flow indicated by the command identifier to obtain target data.

Because the computing nodes of the recommendation system need to obtain different types of data from different databases at different data stages, for example, a recall process needs to obtain a large amount of material data from a memory node, a ranking stage needs to obtain a large amount of historical data of users from a user database, and then the obtained large amount of data is processed in the computing nodes. The computing node is arranged at different data processing nodes, and the near memory accelerator in the memory node is instructed by different instructions according to service requirements to process the data required by the computing node first, so that the data are filtered. For example, the above-mentioned recall process is implemented by a near-memory accelerator, vectors of a large number of commodities acquired from a material library are filtered, and filtered data are returned to a computing node, so that the data volume sent from the memory node to the computing node is reduced, the time delay from the transmission of the data from the memory node to the computing node is reduced, meanwhile, the data volume to be processed by the computing node can be reduced, the resource occupation of the computing node is reduced, and the data processing efficiency of the computing node is improved.

In a possible implementation manner, when a computing node acquires target data corresponding to a preset number of commodities from a material library, a command identifier is a first type of command identifier, and the acquisition request further includes a target feature vector and a similarity operator; the command identifier indicates the memory node to determine the similarity between each target vector and the target characteristic vector in the multiple target vectors according to the similarity operator; determining a preset number of target vectors according to the similarity between each target vector and the target feature vector, and determining the target vectors with preset data volume or the index information of the target vectors with preset data volume as target data.

In a possible implementation manner, target data corresponding to a preset number of commodities is acquired from a material library at a computing node, vectors in memory nodes belong to various categories, a category vector representing the category corresponds to a vector of each category, and the command identifier is a second type of command identifier; the acquisition request also comprises a target feature vector and a similarity operator; the command identifier indicates the memory node to determine the similarity between each category vector in the category vectors and the target feature vector according to the category vectors corresponding to the target vectors; determining the weight corresponding to each category according to the similarity between each category vector and the target feature vector; obtaining a corresponding number of vectors from the vectors of each category according to the weight corresponding to each category to obtain a first vector set; according to the similarity between each target vector and the target feature vector in the first vector set, determining a preset number of target vectors, and determining the target vectors with a preset data volume or the index information of the target vectors with the preset data volume as target data.

In a possible implementation manner, when the computing node acquires context features corresponding to a user from a user database, the command identifier is a command identifier of a third type, and the acquisition request further includes a target feature vector, a similarity operator and an aggregation operator; the command identifier indicates the memory node to determine the similarity between each target vector and the target characteristic vector in the multiple target vectors according to the similarity operator; determining a preset number of target vectors according to the similarity between each target vector and the target feature vector; and executing aggregation operation on the preset number of target vectors according to the aggregation operator to obtain target data.

In a possible implementation manner, the obtaining request further includes a weight array; and the command identifier indicates the memory node to determine a preset number of target vectors according to the similarity between each target vector and the target eigenvector, and then perform aggregation operation on the preset number of target vectors according to the aggregation operator and the weight array to obtain target data.

In a possible implementation manner, when the computing node acquires a context feature corresponding to a user from a user database, the command identifier is a fourth type of command identifier, and the acquisition request further includes an aggregation operator; and the command identifier indicates the memory node to perform aggregation operation on the multiple target vectors according to the aggregation operator and the weight array to obtain target data.

In a possible implementation manner, the obtaining request further includes a weight array; and the command identifier instructs the memory node to perform aggregation operation on the multiple target vectors according to the aggregation operator and the weight array to obtain target data.

In a third aspect, the present application provides a recommendation system, where the recommendation system includes one or more computing nodes and one or more memory nodes, where each memory node of the one or more memory nodes is configured to implement the method described in the first aspect or any possible implementation manner of the first aspect, and each computing node of the one or more computing nodes is configured to implement the method described in the second aspect or any possible implementation manner of the second aspect.

In a fourth aspect, the present application provides a memory node, where the memory node includes a near memory accelerator, and the near memory accelerator includes a unit configured to implement the method described in the first aspect or any possible implementation manner of the first aspect.

In a fifth aspect, the present application provides a computing device comprising means for implementing a method as described in the second aspect above or any possible implementation manner of the second aspect.

In a sixth aspect, the present application provides a memory node, including a processor and a memory, where the memory is used to store instructions, and the processor is used to execute the instructions, and when the processor executes the instructions, the data processing method described in the first aspect or any possible implementation manner of the first aspect is executed.

In a seventh aspect, the present application provides a computing device, including a processor and a memory, where the memory is used to store instructions, and the processor is used to execute the instructions, and when the processor executes the instructions, the data processing method described in the second aspect or any possible implementation manner of the second aspect is executed.

In an eighth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, performs the method of data processing as in the first aspect or any one of the possible implementation manners of the first aspect.

In a ninth aspect, the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs a method of data processing as set forth in the second aspect or any one of the possible implementations of the second aspect.

In a tenth aspect, the present application provides a computer program product, which includes instructions that, when executed by a computer, enable the computer to perform the data processing method described in the first aspect or any possible implementation manner of the first aspect.

In an eleventh aspect, the present application provides a computer program product, which includes instructions that, when executed by a computer, enable the computer to execute the data processing method described in the second aspect or any possible implementation manner of the second aspect.

Drawings

FIG. 1 is a schematic diagram of a recommendation system architecture provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of a memory node according to an embodiment of the present disclosure;

fig. 3 is an interaction diagram of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a recall process provided by an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a computing device according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of another computing device provided in an embodiment of the present application;

FIG. 7 is a block diagram of a computing device according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of another computing device provided in an embodiment of the present application.

Detailed Description

The data processing method of the present application will be described in detail below with reference to the accompanying drawings.

As shown in fig. 1, fig. 1 is a schematic diagram of a recommendation system architecture provided in an embodiment of the present application. The recommendation system comprises one or more computing nodes 10 and one or more memory nodes 20, the computing nodes 10 are in communication connection with the memory nodes 20 through an internet, and data of a material library or data of a user database are stored in the memory nodes 20. Wherein, the data in the material library is an embedding (embedding) vector (hereinafter referred to as vector) corresponding to the recommended object. For example, if the recommendation system is a recommendation system of an e-commerce website and aims to recommend commodities for a user, vectors corresponding to the commodities are stored in the material library; if the recommendation system is a recommendation system of a video website and aims to recommend videos for users, vectors corresponding to the videos are stored in the material library. The data in the user database is a vector corresponding to the historical data of the user, and the historical data comprises historical browsing records, historical purchasing records and the like. The computing node 10 is used to implement the relevant computing process of the recommendation system training phase or inference phase. For convenience of description, in the following embodiments of the present application, a recommendation system is taken as a recommendation system of an e-commerce platform, and a vector corresponding to a commodity is stored in a material library as an example for explanation.

The training stage is mainly to train the ranking model, and the computing node 10 needs to obtain the user history data from the user database. And in the inference stage, the computing node 10 recommends commodities for the user according to the trained ranking model, the user characteristics, the context characteristics corresponding to the user and the commodity characteristics. The reasoning stage comprises a recall process and a sequencing process, wherein the recall process is to quickly find out a part of commodities which are potentially interesting to a user from massive commodities in a material library according to the user characteristics of the user to serve as a candidate commodity set; and the sorting process comprises the steps of inputting the vector corresponding to each commodity in the candidate commodity set, the user characteristic and the context characteristic corresponding to the user into a trained sorting model to obtain the score of each commodity in the candidate commodity set, and recommending the commodity for the user according to the score of each commodity, wherein the context characteristic corresponding to the user is obtained according to the historical data of the user.

As shown in fig. 2, fig. 2 is a schematic structural diagram of a memory node according to an embodiment of the present disclosure; the memory node 20 includes a near memory accelerator 100 and a memory 200 for storing data, and the near memory accelerator 100 includes a communication unit 110, a command parsing unit 120, a calculation unit 130 and a memory control unit 140. The communication unit 110 is configured to obtain an obtaining request sent by the computing node 10, and return target data obtained according to the obtaining request to the computing node 10; the command parsing unit 120 is configured to parse the acquisition request received by the communication unit 110, and send parsed information to the computing unit 130 and the memory control unit 140; the calculating unit 130 is configured to perform data processing on the command obtained through the analysis by the command analyzing unit 120 and the data obtained from the memory 200 by the memory control unit 140 to obtain target data; the memory control unit 140 is configured to obtain data required for data processing by the computing unit 130 from the memory 200 according to the index information obtained by analyzing the obtaining request by the command analyzing unit 120, and send the data to the computing unit 130.

It should be noted that, the computing node 10 of the recommendation system may be a physical server, or may be a virtual machine or a container, and when the computing node 10 is a virtual machine or a container, multiple computing nodes 10 may be deployed in the same physical server, or may be deployed in different physical servers, for example, the computing node 10 used in the training phase and the computing node 10 used in the recommendation phase are separately deployed in different physical servers. The internet may be an RDMA (RDMA over converted Ethernet, roCE) network based on Remote Direct Memory Access (RDMA), a wireless bandwidth (Infiniband, IB), or the like. The near memory accelerator 100 in the memory node 20 is connected to the memory 200 through a memory bus.

The data processing method of the present application may be implemented by a Central Processing Unit (CPU) of the memory node 20 through software, and when the data processing method is implemented by software, the near memory accelerator 100 and each unit thereof are software modules. Near memory accelerator 100 may also be implemented in hardware, such as an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), a Digital Signal Processor (DSP), or any combination thereof, where the PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The storage 200 is a memory in the memory node 20, and is configured to store a vector of a commodity in the material library or a vector corresponding to user history data in the user database, where the material library or the user database includes one or more memory nodes 20. The memory is a Random Access Memory (RAM), which is an internal memory capable of directly exchanging data with the near memory accelerator 100. The RAM includes a static random access memory (DRAM), a Dynamic Random Access Memory (DRAM), a Synchronous Dynamic Random Access Memory (SDRAM), a double data rate DRAM (DDR), a low power DDR (LPDDR), a graphics DDR (GDDR), a High Bandwidth Memory (HBM), and the like. The memory is disposed in the computing device in the form of a memory module, and commonly used memories include Dual Inline Memory Modules (DIMMs) and Single Inline Memory Modules (SIMMs).

The recommendation system in fig. 1 is deployed in a server cluster, where the server cluster includes a plurality of computing servers (e.g., the computing nodes) and a plurality of storage servers (e.g., the memory nodes) for storage, and when receiving a data processing task, the server cluster can process data acquired from the storage servers through one or more computing servers; or when a plurality of data processing tasks are acquired simultaneously, the plurality of data processing tasks are distributed to a plurality of calculation servers for processing respectively. In a possible implementation manner, the recommendation system may also be deployed in one physical server. When the recommended system is a physical server, the computing node 10 is a processor, such as a CPU, of the physical server, and the storage 200 is a memory of the physical server. The near memory accelerator 100 is a hardware device located between the processor and the storage 200, and the near memory accelerator 100 is connected to the processor and the storage 200 through memory buses, respectively. As a possible implementation manner, the server cluster may also be a distributed cluster, that is, the same request may be completed by multiple memory nodes together, or the data storage manner is a distributed storage manner.

In the recall process and the sorting process of the computing node 10 in the training phase and the inference phase, the computing node 10 needs to acquire data from the memory 200 for computation, for example, in the recall process of the inference phase, the computing node 10 needs to acquire a vector corresponding to a part of goods potentially interested by a user from vectors corresponding to a large number of goods in the material library, and in the sorting process, needs to acquire historical data of the user from the user database. Therefore, the compute node 10 needs to send an obtaining request to the memory node 20 to instruct the memory node 20 to obtain data required for computation from the storage 200 of the memory node 20, and process the data obtained from the storage 200 according to the data processing procedure indicated in the obtaining request to obtain target data.

In the present application, the acquisition request at least includes a command identifier and table information. The table information includes one or more table identifiers, which are used to indicate an embedded table (embeddingtable) in which a plurality of target vectors in a target vector set that the memory node 20 needs to operate are located, that is, to indicate the memory control unit 140 to acquire the target vectors from the memory 200; the command identifier is used to instruct the memory node 20 to process a processing flow of data after receiving the acquisition request and to indicate a form of returned target data, where the command identifiers are different, and operations executed by the memory node 20 on the target vector acquired from the storage 200 are different, and the returned target data may be one or more vectors or index information corresponding to the one or more vectors; the operator instructs the memory node 20 to execute an operation on the target vector after the target vector is retrieved from the storage 200 in accordance with the retrieval request. That is, the obtaining request indicates the memory node 20 to obtain the target vector from the storage according to the table information, and then execute the corresponding processing flow on the target vector according to the command identifier and the operator, so as to obtain the target data, and return the target data to the computing node.

The computing node 10 obtains different command identifiers in the request according to different service requirements or different data storage manners in the memory node 20, and the request may further include any one or more fields except the command identifier (opcode) and the table information (table _ info) in table 1. Illustratively, if the computing node executes an inference process and needs to recommend a plurality of commodities to a user from a large number of commodities, the acquisition request further includes an operator optype1 of similarity, feature information (feature _ info), a preset number (K), and a target feature vector (target _ feature), where the target feature vector is a user feature vector, and the table information and the feature information are used together as index information to indicate a plurality of target vectors that need to be operated, that is, vectors corresponding to different commodities. After the acquisition request indicates the memory node to acquire the target vectors from the memory according to the table information and the feature information, the similarity between each target vector and the target feature vector is calculated, and the index information corresponding to the K target vectors with the highest similarity is returned to the calculation node.

TABLE 1

Specifically, the acquisition requests of the present application mainly include the following eight ways, and the processing flow of the command identifier, the field, and the command identifier indication included in each acquisition request is as follows.

In mode 1, a command identifier in an acquisition request is 0001, and the acquisition request further includes table information (table _ info), feature information (feature _ info), a similarity operator (optype 1), a preset number (K), and a target feature vector (target _ feature). The command identifier instructs the memory node 20 to first determine one or more target embedded tables according to one or more table identifiers in the table information, and obtain a plurality of target vectors in the target embedded tables from the memory 200 through the memory control unit 140 according to a feature sequence number array (feature _ idx [ ]) in the feature information; then, calculating an inner product of each target vector and each target feature vector through the calculating unit 130 to obtain the similarity of each vector and each target feature vector; and finally, determining K target vectors with the highest similarity, taking the index information of the K target vectors and the respective corresponding similarities of the K target vectors as target data, and returning the target data to the computing node.

In mode 2, the command identifier in the acquisition request is 0010, and the acquisition request further includes table information (table _ info), feature information (feature _ info), a similarity operator (optype 1), a preset number (K), and a target feature vector (target _ feature). The command identifier instructs the memory node 20 to first determine one or more target embedded tables according to one or more table identifiers in the table information, and obtain a plurality of target vectors in the target embedded tables from the memory 200 through the memory control unit 140 according to a feature sequence number array (feature _ idx [ ]) in the feature information; then, calculating an inner product of each target vector and each target feature vector through the calculating unit 130 to obtain the similarity of each vector and each target feature vector; and finally, determining K target vectors with the highest similarity, taking the K target vectors and the respective corresponding similarities of the K target vectors as target data, and returning the target data to the computing node.

In mode 3, the command identifier in the acquisition request is 0011, and the acquisition request further includes table information (table _ info), feature information (feature _ info), a similarity operator (optype 1), a preset number (K), and a target feature vector (target _ feature). The feature information (feature _ info) of the acquisition request includes only the feature number (feature _ num) and does not include the feature number (feature _ idx [ ]).

It should be noted that, when the command flag is 0011, the vectors in the storage 200 of the memory node 20 are stored in a classified manner, for example, the vectors stored in the storage 200 correspond to various commodities, the commodities include 6 categories of household appliances, living goods, cosmetics, sporting goods, office goods, and digital products, when the vectors in the storage 200 of the memory node 20 are stored in a classified manner, each category corresponds to one category vector, each category includes multiple commodities, and each commodity corresponds to one vector.

The command identifier 0011 instructs the memory node 20 to first determine one or more target embedded tables according to one or more table identifiers in the table information, obtain category vectors corresponding to a plurality of categories included in the target embedded tables from the memory 200 through the memory control unit 140, calculate category similarity between each category vector and the target feature vector, and determine the weight Pi corresponding to each category according to the category similarity. Wherein, the greater the similarity, the greater the weight; i represents a category identifier, and if there are 6 categories, the value of i is 1 to 6. Then, according to the feature quantity (feature _ num) N in the feature information and the weight Pi corresponding to each category, determining the quantity Mi of target vectors obtained from the vectors included in each category, then calculating the similarity between each vector in the ith category and the target feature vector, and taking Mi vectors with the highest similarity; and finally, sequencing the similarity of Mi vectors corresponding to each category, determining K target vectors with the highest similarity, and taking the index information of the K target vectors and the similarity corresponding to the K target vectors as target data. Or after the number Mi of target vectors obtained from the vectors included in each category is determined, mi vectors are randomly obtained from the ith category, and the similarity between each target vector and the target feature vector in the Mi vectors in the ith category is calculated; and finally, sequencing the similarity between the vectors acquired from all the categories and the target characteristic vectors, determining K target vectors with the highest similarity, taking the index information of the K target vectors and the similarity corresponding to the K target vectors as target data, and returning the target data to the computing node.

Exemplarily, if vectors of 6 classes are stored in the target embedding table, and the class similarity between the class vectors of 6 classes and the target feature vector is S1, S2, S3, S4, S5, and S6, respectively, the weight Pi = Si/(S1 + S2+ S3+ S4+ S5+ S6) of the ith class; if the number of features in the feature information is N, the number Mi = N × Pi of the target vectors acquired in the ith category.

Mode 4, the command identifier in the acquisition request is 0100, and the acquisition request further includes table information (table _ info), feature information (feature _ info), a similarity operator (optype 1), a preset number (K), and a target feature vector (target _ feature). The fourth command identifier is the same as the processing flow corresponding to the third command identifier, except that the third command identifier indicates that the memory node 20 returns index information corresponding to K target vectors, and the fourth command identifier indicates that the memory node returns K target vectors.

Mode 5, the command identifier in the fetch request is 0101, and the fetch request further includes table information (table _ info), feature information (feature _ info), and an aggregation operator (optype 2). The command identifier instructs the memory node 20 to first determine one or more target embedded tables according to one or more table identifiers in the table information, and obtain target vectors in the target embedded tables from the memory 200 through the memory control unit 140 according to a feature sequence number array (feature _ idx [ ]) in the feature information; then, the operation indicated by the aggregation operator optype2 is performed on the target vector through the computing unit 130, so as to obtain target data, and the target data is returned to the computing node.

Mode 6, the command identifier in the acquisition request is 0110, and the acquisition request further includes table information (table _ info), feature information (feature _ info), a similarity operator (optype 1), an aggregation operator (optype 2), a preset number (K), and a target feature vector (target _ feature). The command identifier instructs the memory node 20 to first determine one or more target embedded tables according to one or more table identifiers in the table information, and obtain target vectors in the target embedded tables from the memory 200 through the memory control unit 140 according to a feature sequence number array (feature _ idx [ ]) in the feature information; then, calculating an inner product of each target vector and each target feature vector through the calculating unit 130 to obtain the similarity of each vector and each target feature vector; and finally, executing the operation indicated by the polymerization operator optype2 on the K target vectors with the highest similarity to obtain target data, and returning the target data to the computing node.

Mode 7, the command identifier in the get request is 0111, and the get request further includes table information (table _ info), feature information (feature _ info), an aggregation operator (optype 2), a preset number (K), and weight data (weight _ info). The command identifier instructs the memory node 20 to first determine one or more target embedded tables according to one or more table identifiers in the table information, and obtain target vectors in the target embedded tables from the memory 200 through the memory control unit 140 according to a feature sequence number array (feature _ idx [ ]) in the feature information; then, the operation indicated by the aggregation operator optype2 based on the weight data is performed on the target vector by the calculation unit 130, so as to obtain target data, and the target data is returned to the calculation node.

Mode 8, a command identifier in an acquisition request is 1000, and the acquisition request further includes table information (table _ info), feature information (feature _ info), a similarity operator (optype 1), an aggregation operator (optype 2), a preset number (K), a target feature vector (target _ feature), and weight data (weight _ info). The command identifier instructs the memory node 20 to first determine one or more target embedded tables according to one or more table identifiers in the table information, and obtain target vectors in the target embedded tables from the memory 200 through the memory control unit 140 according to a feature sequence number array (feature _ idx [ ]) in the feature information; then, calculating the inner product of each target vector and the target characteristic vector through a calculating unit 130 to obtain the similarity of each vector and the target characteristic vector; and finally, executing the operation indicated by an aggregation operator optype2 based on the weight data on the K target vectors with the highest similarity to obtain target data, and returning the target data to the computing node.

It should be understood that the above 8 commands are only examples, and the obtaining request sent by the computing node may also carry more or fewer fields to instruct the memory node 20 to execute different processing flows. For example, the feature information (feature _ info) of the above mode 3 acquisition request may further include a feature number array (feature _ idx [ ]), and the memory node 20 executes the operation indicated by the third command identifier according to the category vector corresponding to the category included in the vector indicated by the feature number array. The feature information may further include a feature _ idx _ en for indicating whether the feature _ idx [ ] is carried in the acquisition request. For example, when the feature _ idx _ en value is 1, it indicates that the feature sequence number array is carried in the acquisition request; when the feature _ idx _ en value is 0, it indicates that the feature sequence number array is not carried in the acquisition request, the command parsing unit 120 first acquires the value of the feature _ idx _ en, if the value of the feature _ idx _ en is 0, it does not acquire data in the feature sequence number array, and if the value of the feature _ idx _ en is 1, it acquires the feature sequence number array in the feature sequence number array.

In a possible implementation manner, the acquisition request may include each field in table 1, and when the computing node generates the acquisition request, the computing node assigns a value to a required field according to a difference of service requirements. The command identifier is not only used for indicating the processing flow of the data processed by the memory node after receiving the acquisition request and the form of the returned target data, but also used for indicating which fields in the acquisition request of the memory node are information. After receiving the acquisition request, the memory node acquires the information in the corresponding field in the request according to the command identifier, and then executes corresponding operation according to the processing flow indicated by the command field and the acquired information in the acquisition request to obtain target data.

The data processing method provided by the embodiment of the present application is introduced from the training phase and the reasoning phase, respectively, and fig. 3 is an interaction diagram of the data processing method provided by the embodiment of the present application, and is used for implementing the reasoning phase in the recommendation system.

S301, the computing node generates a first obtaining request and sends the first obtaining request to the first memory node.

The first memory node is a node storing data in the material library, that is, the first memory node is used for realizing the recall process of the inference stage, and returning K target vectors or index information of the K target vectors to the computing node, and the similarity between each target vector in the K target vectors and the target feature vector.

The first acquisition request may be any one of the acquisition requests in the above-described manner 1 to manner 4. Specifically, if the first memory node does not store the vector corresponding to each commodity according to the commodity category, the first acquisition request is the acquisition request of the above mode 1 or the acquisition request of the mode 2; if the first memory node stores the vector corresponding to each commodity according to the commodity category, the first acquisition request is any one of the acquisition requests of the above-described modes 1 to 4. And the target characteristic vector in the first acquisition request is a user characteristic vector of the user.

It should be understood that the material library includes vectors corresponding to a large number of commodities, the vectors corresponding to the commodities may be stored in the storage 200 of one or more memory nodes, and the recall stage needs to obtain K target vectors or index information of the K target vectors from each memory node of the one or more memory nodes, respectively. Vectors corresponding to commodities required by the computing node 10 are stored in the memories 200 of the plurality of memory nodes, and the computing node 10 generates a corresponding first acquisition request according to data stored in each memory node and sends the first acquisition request to the corresponding memory node. In the embodiment of the present application, an example in which a computing node sends an acquisition request to a memory node is described.

S302, the first memory node receives a first acquisition request sent by the computing node, and executes the operation indicated in the first acquisition request according to the first acquisition request to obtain target data.

The first memory node that receives the first acquisition request parses the first acquisition request through the command parsing unit 120, and the command parsing unit 120 sends information, such as table information (table _ info) and feature information (feature _ info), used for acquiring data from the memory 200 in the first acquisition request to the memory control unit 140; information for calculation in the first acquisition request, such as a similarity operator (optype 1), a preset number (K), and a target feature vector (target _ feature), is sent to the calculation unit 130. The memory control unit 140 acquires the target vector from the memory 200 according to the table information and the feature information, and then transmits the target vector to the calculation unit 130. The calculating unit 130 executes corresponding operations according to the processing flow corresponding to the command identifier, according to the command identifier, the information for calculation, and the target vector, so as to obtain target data.

Illustratively, the command identifier in the first get request is 0001, and the get request further includes table information (table _ info), feature information (feature _ info), a similarity operator (optype 1), a preset number (K), and a target feature vector (target _ feature). As shown in fig. 4, each first memory node receiving the first obtaining request obtains vectors corresponding to a plurality of commodities from the memory 200 as target vectors through the memory control unit 140 according to the table information and the feature sequence number array in the feature information; then, the target vectors are sent to the calculation unit 130, and the calculation unit 130 calculates the inner product of each target vector and the target feature vector to obtain the similarity between each vector and the target feature vector; and finally, determining K target vectors with the highest similarity, and taking the index information of the K target vectors and the respective corresponding similarities of the K target vectors as target data. The data volume of the target data is smaller than the data volume of the target vector indicated in the table information and the characteristic information, and the memory node can screen the target vector by executing the processing flow indicated by the command identifier, so that the data volume transmitted from the memory node to the computing node is reduced. The specific method for obtaining the target data by the calculating unit 130 executing the corresponding operation according to the command identifier, the information for calculation, and the target vector and the processing flow corresponding to the command identifier may refer to the related description of the foregoing modes 1 to 8, and is not described herein again.

S303, the first memory node sends the target data to the computing node.

After obtaining the target data through the method in S302, the computing unit 130 of the first memory node sends the target data to the computing node through the communication unit 110, so that the computing node executes a subsequent data processing flow according to the target data.

It should be understood that the storage of the first memory node stores a material library, that is, the first memory node returns a product identifier of each product of the K products when the first memory node returns target data, such as vectors of the K products or index information of the K products, to the computing node when the first memory node executes the recall process.

S304, the computing node receives the target data and determines a candidate recommended commodity set from the target data.

As shown in fig. 4, the computing node 10 receives the commodity identifiers of K commodities, the index information of vectors corresponding to the K commodities, and K similarity returned by each first memory node, where each commodity corresponds to one commodity identifier, the index information of one vector, and one similarity. If the computing node 10 sends the first acquisition request to the n memory nodes, the computing node 10 can receive the commodity identifications of the nK commodities, the index information of the vectors corresponding to the nK commodities, and the nK similarities. The calculation node 10 determines, according to the nK similarities, M commodities corresponding to the highest similarity, and takes the M commodities as candidate recommended commodities to obtain a candidate recommended commodity set, where the candidate recommended commodity set includes commodity identifiers of the candidate recommended commodities.

S305, the computing node generates a second obtaining request and sends the second obtaining request to a second memory node.

After the calculation node 10 determines the product identification set of the candidate recommended products, for example, the calculation node 10 obtains M candidate recommended products, the score of each product needs to be obtained through a trained ranking model, and the recommendation system recommends the product to the user according to the score of each product. Specifically, the calculation node 10 first obtains, according to the index information of the M commodities, vectors corresponding to the M commodities that need to be used in calculation of the ranking model, then inputs the vectors of the commodities in the M commodities, the user feature vector of the user, and the context feature of the user into the trained ranking model, obtains the score of the commodity through the ranking model, and further obtains the score of each commodity in the M commodities.

When the computing node 10 scores the commodity according to the trained ranking model, the context characteristics of the user need to be acquired from the user database, the computing node 10 generates a second acquisition request, the second acquisition request may be any one of the above-mentioned acquisition requests in the modes 5 to 8, that is, the computing node needs the memory node to perform aggregation operation on the target vectors corresponding to the acquired historical data and then returns the target vectors to the computing node, and the second acquisition request may also be a mode 2, that is, the computing node only needs the memory node to return the target vectors corresponding to the acquired K pieces of historical data to the computing node. And the target characteristic vector in the second acquisition request is a vector corresponding to one commodity in the M commodities.

It should be understood that the historical data of the user required by the computing node 10 may be stored in the storage 200 of one or more second memory nodes, and the computing node 10 generates a corresponding second obtaining request according to the data stored in each second memory node and sends the second obtaining request to the corresponding second memory node.

S306, the second memory node receives a second acquisition request sent by the computing node, executes the operation indicated in the second acquisition request according to the second acquisition request, obtains the context characteristics of the user, and sends the context characteristics to the computing node.

For the recommendation system of the e-commerce platform, the user database includes history data of the user, for example, a purchase record of the user, a category of a commodity purchased by the user, a price of purchasing each commodity, a browsing record of the user, time corresponding to browsing each commodity by the user, and the like, the second memory node needs to acquire history data related to a target feature vector (that is, a vector corresponding to a certain commodity in the M commodities) in the second acquisition request, and process the acquired history data according to an operation indicated in the second acquisition request to acquire a context feature of the user, where the context feature can reflect a degree of attention of the user to the commodity. The memory node that receives the second acquisition request parses the second acquisition request through the command parsing unit 120, and then processes the flow indicated by the command identifier.

The command parsing unit 120 transmits information for acquiring data from the memory 200 in the second acquisition request, such as table information (table _ info) and feature information (feature _ info), to the memory control unit 140; information for calculation in the second acquisition request, such as an aggregation operator (optype 2), a target feature vector (target _ feature), and weight data (weight _ info), is sent to the calculation unit 130. The memory control unit 140 acquires a vector corresponding to the user history data from the memory 200 as a target vector according to the table information and the feature information, and then transmits the target vector to the calculation unit 130. The calculating unit 130 executes corresponding operations according to the command identifier, the information for calculation, and the target vector and according to the processing flow corresponding to the command identifier, so as to obtain the context feature of the user.

After obtaining the target data through the method in S306, the computing unit 130 of the second memory node sends the target data to the computing node 10 through the communication unit 110, so that the computing node 10 executes a subsequent data processing flow according to the target data.

And S307, the calculation node obtains the score of each candidate recommended commodity based on the trained sorting model.

After the context feature corresponding to one candidate recommended commodity is obtained by the method, the computing node 10 inputs the vector of the commodity, the user feature vector of the user and the context feature into the trained ranking model, the score of the commodity is obtained through the ranking model, the score of each commodity in the M commodities is further obtained, and the recommending system recommends the commodity to the user according to the score of each commodity.

Because a computing node of the recommendation system needs to acquire a large amount of material data from a memory node in a recall process, a large amount of historical data of a user needs to be acquired from a user database in a sequencing stage, and then the acquired large amount of data is processed in the computing node, the bandwidth between the computing node and the memory node is limited, and the time delay of data transmission from the memory node to the computing node is long. In the application, the computing node instructs a memory access device in the memory node to process data required by the computing node through an instruction, for example, the recall process is realized through a near memory device, vectors of a large amount of commodities acquired from a material library are filtered, and the filtered data are returned to the computing node, so that the data volume sent from the memory node to the computing node is reduced, the time delay from data transmission from the memory node to the computing node is reduced, meanwhile, the data volume required to be processed by the computing node can be reduced, the resource occupation of the computing node is reduced, and the data processing efficiency of the computing node is improved.

The process of the inference phase of the recommendation system is introduced in the embodiment shown in fig. 3, the recommendation system mainly trains the ranking model in the training phase, when the ranking model is trained, the computing node also needs to obtain the context feature of the user from the memory node, and when the ranking model is trained, the method for obtaining the context feature of the user from the memory node storing sample data by the computing node may refer to the method for obtaining the context feature from the user database in the inference phase, which is not described herein again.

It should be noted that, for simplicity of description, the above method embodiments are described as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence, and those skilled in the art should understand that the embodiments described in the specification belong to the preferred embodiments, and the actions involved are not necessarily required by the present invention.

Other reasonable combinations of steps that can be conceived by one skilled in the art from the above description are also within the scope of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a computing device provided in an embodiment of the present application, where the computing device 500 includes: processor 510, communication interface 520, and memory 530, processor 510, communication interface 520, and memory 530 coupled to each other by a bus 540. The computing device 500 may correspond to the memory node 20 shown in fig. 1 and may perform operations performed by the corresponding subjects in the method embodiment corresponding to fig. 3.

The processor 510 may be implemented in various ways, for example, the processor 510 may be a CPU (central processing unit), a Graphics Processing Unit (GPU), or the like, and the processor 510 may also be a single-core processor or a multi-core processor.

It should be noted that the near memory accelerator 100 may be implemented in the form of software, and may also be implemented in the form of hardware, which is specifically described with reference to fig. 2. When the near memory accelerator is implemented in software, the processor 510 includes the near memory accelerator 100, and the processor 510 may perform various operations through the near memory accelerator 100 by referring to the specific operations of the memory node in the above method embodiments, that is, the near memory accelerator may perform the operations performed by the memory node in fig. 3 by program codes in the storage 530 executed by the computing resources of the processor 510.

The communication interface 520 may be a wired interface or a wireless interface, and is configured to communicate with other modules or devices, for example, connect with the computing node 10 to receive an acquisition request sent by the computing node 10; the wired interface may be an ethernet interface, a Local Interconnect Network (LIN), etc., and the wireless interface may be a cellular network interface or use a wireless lan interface, etc.

The memory 530 may be a non-volatile memory, such as a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Memory 130 may also be volatile memory, which may be Random Access Memory (RAM), that acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM).

Memory 530 may be used to store program codes and data. Stored program code for facilitating calls by processor 510 to perform the operational steps of the above-described method embodiments. Moreover, computing device 500 may contain more or fewer components than shown in FIG. 5, or have a different arrangement of components. It should be understood that the computing device 500 according to the embodiment of the present application may correspond to the memory node shown in fig. 2 in the embodiment of the present application, and the storage 530 may be the same as the storage 200 in fig. 2 or may be a different storage. When memory 530 is the same set of memory as memory 200 of FIG. 2, the data stored in memory 530 includes an embedded table; when the memory 530 is a different set of memories from the memory 200 of fig. 2, the memory 200 is a memory for storing data in the material library or the user database, and the memory 530 stores program codes and other data.

The bus 540 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 540 may be divided into an address bus, a data bus, a control bus, a memory bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

As shown in fig. 6, fig. 6 is a schematic structural diagram of another computing device 500 provided in this embodiment, when the near memory accelerator 100 is implemented by hardware, the near memory accelerator 100 may also be a hardware structure located between a processor and a storage (for example, a memory (also referred to as a main memory), the near memory accelerator 100 includes a processor 1001 and a communication interface 1002, the communication interface 1002 is configured to obtain a fetch request sent by the computing node 10 and obtain a target vector according to the fetch request, and the processor 1001 is configured to implement operations implemented by the command parsing unit 120 and the computing unit 130 in the method embodiment corresponding to fig. 3, which may specifically refer to descriptions in the method embodiment described above and will not be repeated herein.

Specifically, for the specific implementation of the computing device 500 to execute various operations, reference may be made to specific operations executed by a memory node in the foregoing method embodiment, and details are not described here again.

The above and other operations and/or functions of each module in the computing device 500 are respectively for implementing corresponding flows of each method in the figure, and are not described herein again for brevity.

As shown in fig. 7, fig. 7 is a schematic structural diagram of a computing apparatus 700 provided in the embodiment of the present application, where the computing apparatus 700 includes a processing unit 710 and a communication unit 720, where,

the processing unit 710 is configured to generate, according to the service requirement, an obtaining request, where the obtaining request includes a command identifier and index information of multiple target vectors in an embedded table of a memory node, and the command identifier indicates a processing procedure of a near memory accelerator of the memory node for processing the multiple target vectors, where a type of the command identifier is associated with the service requirement of the computing node.

The communication unit 720 is configured to send the obtaining request to the memory node, so that the memory node obtains multiple target vectors from the storage according to the index information, and processes the multiple target vectors according to the processing flow indicated by the command identifier to obtain target data.

It should be understood that the computing apparatus 700 according to the embodiment of the present invention may be implemented by a Central Processing Unit (CPU), an application-specific integrated circuit (ASIC), or a Programmable Logic Device (PLD), which may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof. When the data processing method shown in fig. 3 can also be implemented by software, the computing apparatus 700 and its respective modules may also be software modules.

Specifically, for the method for implementing data processing by the processing unit 710 and the communication unit 720 in the computing apparatus 700, reference may be made to relevant operations executed by the computing node in the foregoing method embodiment, and details are not repeated here.

The units may perform data transmission with each other through a communication path, and it should be understood that each unit included in the computing apparatus 700 may be a software unit, a hardware unit, or a part of the software unit and a part of the hardware unit.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and each module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 8 is a schematic structural diagram of a computing device provided in an embodiment of the present application, where the computing device 800 includes: the processor 810, the communication interface 820, and the memory 830, and the processor 810, the communication interface 820, and the memory 830 are connected to each other through a bus 840. Computing device 800 may correspond to computing node 10 shown in fig. 1 and may correspond to operations performed by respective ones of the method embodiments corresponding to fig. 3.

The specific implementation of the processor 810 to perform various operations may refer to the specific operations of the computing node 10 in the above method embodiments. For example, the processor 810 is configured to generate the above-mentioned obtaining request, perform model training, and the like, and will not be described herein again.

The processor 810 may be implemented in various ways, for example, the processor 810 may be a CPU or a GPU, and the processor 810 may also be a single-core processor or a multi-core processor. The processor 810 may be a combination of a CPU and a hardware chip. The hardware chip may be an ASIC, PLD, or a combination thereof. The aforementioned PLD may be a CPLD, an FPGA, a GAL, or any combination thereof.

The communication interface 820 may be a wired interface, such as an ethernet interface, a Local Interconnect Network (LIN), or the like, or a wireless interface, such as a cellular network interface or a wireless lan interface, for communicating with other modules or devices.

The memory 830 may be used to store program code and data that facilitate the processor 810 to invoke the program code stored in the memory 830 to perform the operational steps of the computing node in the above-described method embodiments. Moreover, computing device 800 may contain more or fewer components than shown in FIG. 8, or have a different arrangement of components.

The bus 840 may be a Peripheral Component Interconnect Express (PCIe) bus, an Extended Industry Standard Architecture (EISA) bus, a unified bus (UBs or UBs), a computer Express link (CXL), a cache coherent Interconnect protocol (CCIX) bus, or the like. The bus 840 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

Specifically, for the specific implementation of the computing device 800 to perform various operations, reference may be made to specific operations performed by a computing node in the foregoing method embodiment, and details are not described herein again.

It should be understood that the computing device 800 according to the embodiment of the present application may correspond to the computing node shown in fig. 3 in the embodiment of the present application, and may correspond to operations executed by a corresponding main body in the method described in fig. 3, and the above and other operations and/or functions of each module in the computing device 800 are respectively for implementing corresponding flows of each method of fig. 3, and are not described again here for brevity.

The embodiments of the present application further provide a non-transitory computer storage medium, where the computer storage medium stores computer program instructions, and when the computer program instructions run on a processor, the method steps implemented by the memory node in the foregoing method embodiments may be implemented, and specific implementation of the processor of the computer storage medium in executing the method steps may refer to specific operations in the foregoing method embodiments, which are not described herein again.

The embodiments of the present application further provide a non-transitory computer storage medium, where the computer storage medium stores computer program instructions, and when the computer program instructions run on a processor, the method steps implemented by the computing node in the foregoing method embodiments may be implemented, and specific implementation of the processor of the computer storage medium to execute the method steps may refer to specific operations in the foregoing method embodiments, and details are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the procedures or functions according to the embodiments of the present application are wholly or partially generated. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium, or a semiconductor medium. The semiconductor medium may be a Solid State Drive (SSD).

The foregoing is only illustrative of the present application. Those skilled in the art should appreciate that changes and substitutions can be made in the embodiments provided herein without departing from the scope of the present disclosure.

Claims

1. A data processing method applied to a recommendation system including one or more compute nodes and one or more memory nodes, the memory nodes including a near memory accelerator and a storage, the storage storing an embedded table, comprising:

the near memory accelerator receives an acquisition request sent by a computing node, wherein the acquisition request comprises a command identifier and index information of a plurality of target vectors in the embedded table, and the command identifier indicates a processing flow of the memory node for processing the plurality of target vectors;

the near memory accelerator acquires the target vectors from the embedded table of the memory according to the index information;

and the near memory accelerator processes the target vectors according to the processing flow indicated by the command identifier to obtain target data, and sends the target data to the computing node.

2. The method of claim 1, wherein the get request further comprises a target eigenvector and a similarity operator, and the command identifier is a first type of command identifier;

the processing, by the near memory accelerator, the target vectors according to the processing flow indicated by the command identifier to obtain target data, including:

the near memory accelerator determines the similarity between each target vector in the target vectors and the target feature vector according to the similarity operator;

and the near memory accelerator determines a preset number of target vectors according to the similarity between each target vector and the target characteristic vector, and determines the target vectors of the preset data volume or the index information of the preset number of target vectors as the target data.

3. The method according to claim 1, wherein the get request further includes a target feature vector and a similarity operator, the vectors in the memory nodes belong to a plurality of categories, the vector of each category corresponds to a category vector representing the category, and the command identifier is a second type of command identifier;

the near memory accelerator determines the similarity between each category vector in the plurality of category vectors and the target feature vector according to the plurality of category vectors corresponding to the plurality of target vectors;

the near memory accelerator determines the weight corresponding to each category according to the similarity between each category vector and the target feature vector;

the near memory accelerator acquires vectors of corresponding quantity from the vectors of each category according to the weight corresponding to each category to obtain a first vector set;

and the near memory accelerator determines a preset number of target vectors according to the similarity between each target vector in the first vector set and the target feature vector, and determines the target vectors of the preset data volume or the index information of the target vectors of the preset data volume as the target data.

4. The method of claim 1, wherein the get request further comprises a target eigenvector, a similarity operator, and an aggregate operator, and wherein the command identification is an identification of a third type;

the near-memory accelerator determines the similarity between each target vector in the multiple target vectors and the target feature vector according to the similarity operator;

the near memory accelerator determines a preset number of target vectors according to the similarity between each target vector and the target feature vector;

and the near memory accelerator executes the aggregation operation on the preset number of target vectors according to the aggregation operator to obtain the target data.

5. The method of claim 4, wherein the get request further comprises a weight array;

the near-memory accelerator performs a convergence operation on the preset number of target vectors according to the convergence operator to obtain the target data, and the method includes:

and the near memory accelerator executes aggregation operation on the preset number of target vectors according to the aggregation operator and the weight array to obtain the target data.

6. The method of claim 1, wherein the get request further comprises an aggregate operator, and wherein the command identification is an identification of a fourth type;

and the near memory accelerator executes the aggregation operation on the plurality of target vectors according to the aggregation operator to obtain the target data.

7. The method of claim 6, wherein the get request further comprises a weight array;

and the near memory accelerator executes the aggregation operation on the plurality of target vectors according to the aggregation operator and the weight array to obtain the target data.

8. A data processing method applied to a recommendation system including one or more compute nodes and one or more memory nodes, the memory nodes including a near memory accelerator and a storage, the storage storing an embedded table, comprising:

the computing node generates an acquisition request according to the service requirement, wherein the acquisition request comprises a command identifier and index information of a plurality of target vectors in an embedded table of the memory node, the command identifier indicates a processing flow of a near memory accelerator of the memory node for processing the plurality of target vectors, and the type of the command identifier is associated with the service requirement of the computing node;

and the computing node sends the acquisition request to the memory node so that the memory node acquires the target vectors from the storage according to the index information and processes the target vectors according to the processing flow indicated by the command identifier to obtain target data.

9. The method of claim 8,

when the computing node acquires target data corresponding to a preset number of commodities from a material library, the command identifier is a first type of command identifier, and the acquisition request further comprises a target feature vector and a similarity operator;

the command identifier indicates the memory node to determine the similarity between each target vector in the multiple target vectors and the target feature vector according to the similarity operator; determining a preset number of target vectors according to the similarity between each target vector and the target feature vector, and determining the target vectors of the preset data volume or the index information of the target vectors of the preset data volume as the target data.

10. The method of claim 8,

acquiring target data corresponding to a preset number of commodities from a material library at the computing node, wherein vectors in the memory node belong to multiple categories, each category of vector corresponds to a category vector representing the category, and the command identifier is a second type of command identifier; the acquisition request further comprises a target feature vector and a similarity operator;

the command identifier indicates the memory node to determine similarity between each category vector in the plurality of category vectors and the target feature vector according to a plurality of category vectors corresponding to the plurality of target vectors; determining the weight corresponding to each category according to the similarity between each category vector and the target feature vector; obtaining a corresponding number of vectors from the vectors of each category according to the weight corresponding to each category to obtain a first vector set; determining a preset number of target vectors according to the similarity between each target vector in the first vector set and the target feature vector, and determining the target vectors of the preset data volume or the index information of the target vectors of the preset data volume as the target data.

11. The method of claim 8,

when the computing node acquires the context characteristics corresponding to the user from the user database, the command identifier is a third type of command identifier, and the acquisition request further comprises a target characteristic vector, a similarity operator and an aggregation operator;

the command identifier indicates the memory node to determine the similarity between each target vector of the multiple target vectors and the target feature vector according to the similarity operator; determining a preset number of target vectors according to the similarity between each target vector and the target feature vector; and executing aggregation operation on the preset number of target vectors according to the aggregation operator to obtain the target data.

12. The method of claim 11, wherein the get request further comprises a weight array; and the command identifier instructs the memory node to determine a preset number of target vectors according to the similarity between each target vector and the target eigenvector, and then perform aggregation operation on the preset number of target vectors according to the aggregation operator and the weight array to obtain the target data.

13. The method of claim 8,

when the computing node acquires the context characteristics corresponding to the user from the user database, the command identifier is a fourth type of command identifier, and the acquisition request further comprises an aggregation operator;

and the command identifier instructs the memory node to perform aggregation operation on the plurality of target vectors according to the aggregation operator and the weight array to obtain the target data.

14. The method of claim 13, wherein the get request further comprises a weight array;

15. A recommendation system comprising one or more computing nodes and one or more memory nodes, each of the one or more memory nodes for implementing the method of any of claims 1-7, each of the one or more computing nodes for implementing the method of any of claims 8-14.

16. A memory node, characterized in that the memory node comprises a near memory accelerator comprising means for implementing the method of any of claims 1-7.

17. A computing device, characterized in that the computing device comprises means for implementing the method of any one of claims 8-14.

18. A memory node comprising a processor and a memory, the memory being configured to store instructions, the processor being configured to execute the instructions, the processor when executing the instructions being configured to perform the method as claimed in any one of claims 1 to 7.

19. A computing device comprising a processor and a memory, the memory for storing instructions, the processor for executing the instructions, the processor when executing the instructions performing the method as recited in any of claims 8 to 14.