CN102184147A

CN102184147A - Device and method for accelerating data processing based on memory interface

Info

Publication number: CN102184147A
Application number: CN2011101163267A
Authority: CN
Inventors: 殷建儒; 姚翠松; 王博
Original assignee: Opzoon Technology Co Ltd
Current assignee: Opzoon Technology Co Ltd
Priority date: 2011-05-05
Filing date: 2011-05-05
Publication date: 2011-09-14

Abstract

The invention discloses a device for accelerating data processing based on a memory interface and relates to the technical field of computer data communication. The device comprises the memory interface, a control unit, a data processing unit and a storage unit, wherein the memory interface is connected with the control unit and the storage unit; the control unit and the storage unit are connected with the data processing unit; the control unit is used for receiving an external control command through the memory interface and instructing the data processing unit to process the data in the storage unit according to the control command; and the data processing unit writes a processing finished mark into the control unit. The invention also discloses a method for accelerating the data processing. By the invention, occupation of a bus bandwidth and a memory bandwidth of a system is reduced, and system resources are saved.

Description

Data processing accelerator and method based on memory interface

Technical field

The present invention relates to the computer science and technology field, particularly a kind of data processing accelerator and method based on memory interface.

Background technology

The technology of fast processing data mainly is divided into two classes at present: a class is a software approach, mainly is to use the software algorithm of optimization or the treatment scheme of optimization, accelerates data processing.Another kind of is hardware approach, uses the hardware of customization to accelerate data processing, and hardware can be divided into following a few class again:

1, the asic chip of customization;

2, the coprocessor of customization, such as deciphering chip CAVIUM, digital signal processor DSP can be included into such;

3, programmable logic device (PLD) is such as FPGA.

Usually these hardware need interconnect with general processor CPU, are subjected to the control of CPU.With interface the most frequently used in the interface of CPU interconnection be PCI/X, PCI-E.Also can be self defined interface, but because self defined interface does not have standard to follow, popular inadequately and general.In addition, after the interface specification of employing was decided, the flow process of data processing had also been determined thereupon.Whether superior standard is to differentiate an interface specification: whether interface specification conformance with standard, and whether data transmission is efficient.

PCI series interfaces and memory interface all are the general popular standard of industry, and the efficient of data transmission is mainly reflected in transfer rate and the number of transmissions.See on the transfer rate that memory interface is not less than the PCI series interfaces, on the treatment scheme of data, the accelerating hardware of memory interface can reduce the number of transmissions, so that the accelerating hardware of memory interface is compared PCI series interfaces data transmission is more efficient.In the scheme of existing fast processing data, processing element and memory part are two independently physical units, and be continuous by the system bus architecture of complexity, needs 6 steps of S101～S106 shown in Figure 1.Data need be transmitted on system bus 4 times from entering system to outflow system, are respectively: S101 is to S102, and S102 is to S103, and S104 is to S105, and S105 is to S106.Each transmission all will take system bus bandwidth and memory bandwidth, has wasted system resource.

Summary of the invention

(1) technical matters that will solve

The technical problem to be solved in the present invention is: how carrying out data when quickening to handle, reducing the taking of system bus bandwidth and memory bandwidth, thus conserve system resources.

(2) technical scheme

For solving the problems of the technologies described above, the invention provides a kind of data processing accelerator based on memory interface, comprise: memory interface, control module, data processing unit, storage unit, described memory interface connects control module and storage unit, described control module all is connected data processing unit with storage unit, described control module is used for receiving outside control command by memory interface, and the designation data processing unit is handled the pending data that are buffered in the described storage unit, the mark write control unit after described data processing unit will be finished dealing with according to described control command.

Wherein, described memory interface is the interface that meets the DDR memory standard.

Wherein, described control module comprises:

Command register is used to receive outside control command;

Data register is used to receive the parameter of described control command needs, and preserves the mark of finishing dealing with that described data processing unit returns.

Wherein, described data processing unit comprises: one or more of general processor, customization ASIC, programmable logic device (PLD), digital signal processor or intellecture property core module.

The present invention also provides a kind of data processing accelerated method that utilizes above-mentioned data processing accelerator based on memory interface, it is characterized in that, may further comprise the steps:

S1: described control module receives outside control command by memory interface;

S2: described pending metadata cache is arrived described storage unit;

S3: data processing unit is handled pending data in the described storage unit according to described control command;

S4: the mark of will finishing dealing with is kept at control module.

Wherein, described control command comprises: reseting data processing unit order CMD_RESET, starting data processing unit order CMD_START, stop data processing unit order CMD_STOP, configuration data processing unit order CMD_CONFIG.

(3) beneficial effect

The present invention quickens by the processing that realizes data based on the data processing accelerator of memory interface, has reduced the taking of system bus bandwidth and memory bandwidth saved system resource.

Description of drawings

Fig. 1 is existing data processing accelerated method process flow diagram;

Fig. 2 is a kind of data processing accelerator structural representation based on memory interface of the embodiment of the invention;

Fig. 3 utilizes a kind of data processing accelerated method process flow diagram that installs among Fig. 2.

Embodiment

Below in conjunction with drawings and Examples, the specific embodiment of the present invention is described in further detail.Following examples are used to illustrate the present invention, but are not used for limiting the scope of the invention.

As shown in Figure 1, the data processing accelerator based on memory interface of the present invention comprises: memory interface, control module, data processing unit, storage unit.Memory interface connects control module and storage unit, and control module all is connected data processing unit with storage unit.Wherein control module is connected with memory interface with the shared cover ddr interface of storage unit; Control module and data processing unit adopt RS232 interface or other self-defining interfaces; Data processing unit is connected by another set of ddr interface with storage unit, storage unit is used dual port memory unit, data processing unit is connected the storage unit different port respectively with memory interface, so data processing unit and memory interface can the parallel work-flow storage unit.

Wherein, memory interface is the interface that meets the DDR memory standard, and ppu can pass through this memory interface unit access memory block, control module.

Control module mainly comprises:

Command register is used to receive outside control command.

Data register is used to receive the parameter of described control command needs, and preserves the mark of finishing dealing with that data processing unit returns.Control command comprises: CMD_RESET (reseting data processing unit), CMD_START (starting data processing unit), CMD_STOP (stopping data processing unit), CMD_CONFIG (configuration data processing unit).

Control module receives outside control command by memory interface, and this control command is delivered to data processing unit by the interface sub-module with data processing unit.

Storage unit can be the internal memory particle, is mainly used in the outside pending data of storage.

Data processing unit is the core cell of apparatus of the present invention, can use general processor, customization ASIC, programmable logic device (PLD), digital signal processor, intellecture property core module etc.Receive the order that control command passes over by RS232 interface or self defined interface, to order the data start address of indication and the pending data between the end address to carry out fast processing according to order indication control data processing unit, concrete treatment scheme is relevant with the function of this accelerator in system, accelerator is born the acceleration of certain special-purpose usually, if device is as the encryption and decryption accelerator, just according to encryption and decryption flow processing data, if as the compressed and decompressed accelerator of view, then according to compressed and decompressed flow processing, after finishing dealing with, will finish mark by RS232 interface or self defined interface and be delivered to control module.

The device that the present invention proposes, data processing unit and storage unit are placed on the same physical unit, one time data processing only need be transmitted on system bus twice, than traditional scheme, S102 is transmitted in device inside to S103 and S104 for this twice to S105 to carry out, thereby has saved taking of twice system bus bandwidth and memory bandwidth.As shown in Figure 3, the method for utilizing said apparatus to realize that data processing is quickened may further comprise the steps:

Step S301, described control module receives outside control command by memory interface.

Step S302 arrives described storage unit with pending metadata cache.

Step S303, data processing unit is handled pending data in the described storage unit according to described control command.

Step S304, the mark of will finishing dealing with is kept at control module.

Be example with message encryption and decryption flow process in the network equipment below, and further specify the present invention in conjunction with traditional data processing accelerated mode.

Message is in the flow direction on the network equipment: message enters network interface card from netting twine on the circuit, and encrypted or deciphering flows to next stage equipment after network interface card enters netting twine again.Step (CPU refers to cpu chip and moves software program on it) as described below:

Step 1, network interface card receives the grouping message from circuit, and message is cached to internal memory through system bus, and network interface card just was through with to the work and the entitlement of this message to this step, and the arrival of notice CPU message.

Step 2, CPU further handles message, finds that message needs complicated encryption or deciphering to calculate, and message address is informed deciphering chip, starts encryption and decryption.

Step 3, deciphering chip through system bus with message from internal memory COPY to chip internal.

Calculating is encrypted or deciphered to step 4, deciphering chip.

Step 5, the deciphering chip message after with encrypt/decrypt returns internal memory through system bus COPY.

Step 6, deciphering chip notice CPU, the message after CPU will handle sends by network interface card.

Above-mentioned steps can be represented present most popular treatment scheme, possesses ubiquity.CPU speed relative system bus and memory speed have overwhelming dominance, and CPU check figure order is more and more, and the CPU processing power promotes obvious all the more, and the bottleneck of said system is embodied in the processing speed that the speed of system bus and memory access can not be mated CPU.Therefore, need under the certain situation of system bus and internal memory transmission bandwidth, reduce occupancy, will promote the data throughput of total system system bus and memory bandwidth.

The encryption and decryption functions of above-mentioned deciphering chip is implanted in the device of the present invention,, present described treatment scheme can be reduced to 4 steps as data processing unit.As described below:

Step 1, network interface card receives the grouping message from circuit, and message is cached to the storage unit of device through system bus, and network interface card just was through with to the work and the entitlement of this message to this step, and the arrival of notice CPU message.

Step 2, CPU further handles message, finds encryption or deciphering calculating that message need be complicated, to the control module transmission start-up control order of device, comprising the address parameter for the treatment of in the storage element of encryption and decryption data place.

Step 3, calculating is encrypted or deciphered to the control module log-on data processing unit of device.The data for the treatment of encryption and decryption (for example: with traditional to load data into internal memory from hard disk similar have been stored in the described storage unit, the data that directly will treat encryption and decryption are loaded into the storage unit of this device from hard disk), data processing unit then carries out encryption and decryption to the address parameter data of corresponding positions and calculates.

Step 4, the data processing unit of device will be finished mark and be delivered to control module, after CPU is known by control module, the message after handling be sent by network interface card.

The main difference of this paper device and classic method is, device integrates storage unit and data processing unit, avoided pending data in storage unit and data processing unit transmission course taking system bus and memory bandwidth.The storage unit of device has been born the internal memory role of classic method.And in the classic method, no matter be which kind of architectural framework, deciphering chip (corresponding to the data processing unit of device) and internal memory (corresponding to the storage unit of device) are two parts independently mutually physically, need to connect by complicated system bus architecture.Pending data need be delivered to data processor from internal memory, and the data back of finishing dealing with is to internal memory, and twice transmission all taken system bus and memory bandwidth.Make system bus bandwidth and memory bandwidth become the system performance bottleneck.This paper device has avoided this twice system bus and memory bandwidth to take.

Above embodiment only is used to illustrate the present invention; and be not limitation of the present invention; the those of ordinary skill in relevant technologies field; under the situation that does not break away from the spirit and scope of the present invention; can also make various variations and modification; therefore all technical schemes that are equal to also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.

Claims

1. data processing accelerator based on memory interface, it is characterized in that, comprise: memory interface, control module, data processing unit, storage unit, described memory interface connects control module and storage unit, described control module all is connected data processing unit with storage unit, described control module is used for receiving outside control command by memory interface, and the designation data processing unit is handled the pending data that are buffered in the described storage unit, the mark write control unit after described data processing unit will be finished dealing with according to described control command.

2. the data processing accelerator based on memory interface as claimed in claim 1 is characterized in that described memory interface is the interface that meets the DDR memory standard.

3. the data processing accelerator based on memory interface as claimed in claim 1 or 2 is characterized in that described control module comprises:

Command register is used to receive outside control command;

4. the data processing accelerator based on memory interface as claimed in claim 3, it is characterized in that described data processing unit comprises: one or more of general processor, customization ASIC, programmable logic device (PLD), digital signal processor or intellecture property core module.

5. a data processing accelerated method that utilizes in the claim 1～4 each described data processing accelerator based on memory interface is characterized in that, may further comprise the steps:

S2: described pending metadata cache is arrived described storage unit;

S4: the mark of will finishing dealing with is kept at control module.

6. data processing accelerated method as claimed in claim 5, it is characterized in that described control command comprises: reseting data processing unit order CMD_RESET, starting data processing unit order CMD_START, stop data processing unit order CMD_STOP, configuration data processing unit order CMD_CONFIG.