CN112346707A - Instruction processing method and device and related product - Google Patents

Instruction processing method and device and related product Download PDF

Info

Publication number
CN112346707A
CN112346707A CN201910724830.1A CN201910724830A CN112346707A CN 112346707 A CN112346707 A CN 112346707A CN 201910724830 A CN201910724830 A CN 201910724830A CN 112346707 A CN112346707 A CN 112346707A
Authority
CN
China
Prior art keywords
square root
instruction
data
activation
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910724830.1A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201910724830.1A priority Critical patent/CN112346707A/en
Publication of CN112346707A publication Critical patent/CN112346707A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/552Powers or roots, e.g. Pythagorean sums

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

The disclosure relates to an instruction processing method, an instruction processing device and a related product. The machine learning device comprises one or more instruction processing devices, is used for acquiring data to be operated and control information from other processing devices, executes specified machine learning operation and transmits the execution result to other processing devices through an I/O interface; when the machine learning arithmetic device includes a plurality of instruction processing devices, the plurality of instruction processing devices can be connected to each other by a specific configuration to transfer data. The command processing devices are interconnected through a Peripheral Component Interface Express (PCIE) bus and transmit data; the plurality of instruction processing devices share the same control system or own control system and share the memory or own memory; the interconnection mode of the plurality of instruction processing apparatuses is an arbitrary interconnection topology. The instruction processing method, the instruction processing device and the related products provided by the embodiment of the disclosure have the advantages of wide application range, high instruction processing efficiency and high instruction processing speed.

Description

Instruction processing method and device and related product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an instruction processing method and apparatus, and a related product.
Background
With the continuous development of science and technology, machine learning, especially neural network algorithms, are more and more widely used. The method is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of neural network algorithms is higher and higher, the types and the number of involved data operations are increasing. In the related art, the square root function (Sqrt, also called Sqrt) is used
Function) has low efficiency and low speed for performing activation operation on data.
Disclosure of Invention
In view of the above, the present disclosure provides an instruction processing method, apparatus and related product to improve the efficiency and speed of square root function activation operation on data.
According to a first aspect of the present disclosure, there is provided a square root function enable instruction processing apparatus, the apparatus comprising:
the control module is used for compiling the obtained square root function activating instruction to obtain a compiled square root function activating instruction, analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction, and obtaining data to be operated and a target address which are required by executing the square root function activating instruction according to the operation code and the operation domain;
the operation module is used for carrying out square root activation operation on the data to be operated to obtain an operation result and storing the operation result into the target address;
the operation code is used for indicating that the activation operation of the square root function activation instruction on the data is a square root function activation operation, and the operation domain comprises a source address and a target address of the data to be operated.
According to a second aspect of the present disclosure, there is provided a machine learning arithmetic device, the device including:
one or more square root function activating instruction processing devices according to the first aspect, configured to obtain data to be operated and control information from another processing device, execute a specified machine learning operation, and transmit an execution result to the other processing device through an I/O interface;
when the machine learning arithmetic device comprises a plurality of square root function activating instruction processing devices, the plurality of square root function activating instruction processing devices can be connected through a specific structure and transmit data;
the square root function activating instruction processing devices are interconnected through a PCIE bus of a fast peripheral equipment interconnection bus and transmit data so as to support operation of larger-scale machine learning; a plurality of square root function activating instruction processing devices share the same control system or own respective control systems; the square root function activating instruction processing devices share a memory or own memories; the interconnection mode of the square root function activation instruction processing devices is any interconnection topology.
According to a third aspect of the present disclosure, there is provided a combined processing apparatus, the apparatus comprising:
the machine learning arithmetic device, the universal interconnect interface, and the other processing device according to the second aspect;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
According to a fourth aspect of the present disclosure, there is provided a machine learning chip including the machine learning network operation device of the second aspect or the combination processing device of the third aspect.
According to a fifth aspect of the present disclosure, there is provided a machine learning chip package structure, which includes the machine learning chip of the fourth aspect.
According to a sixth aspect of the present disclosure, a board card is provided, which includes the machine learning chip packaging structure of the fifth aspect.
According to a seventh aspect of the present disclosure, there is provided an electronic device, which includes the machine learning chip of the fourth aspect or the board of the sixth aspect.
According to an eighth aspect of the present disclosure, there is provided a square root function activate instruction processing method, which is applied to a square root function activate instruction processing apparatus, the method including:
compiling the obtained square root function activating instruction to obtain a compiled square root function activating instruction, analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction, and obtaining data to be operated and a target address which are required for executing the square root function activating instruction according to the operation code and the operation domain;
performing square root activation operation on the data to be operated to obtain an operation result, and storing the operation result into the target address;
the operation code is used for indicating that the activation operation of the square root function activation instruction on the data is a square root function activation operation, the square root activation function parameter table comprises a square root activation function activation table, and the operation domain comprises a source address and a target address of the data to be operated.
According to a ninth aspect of the present disclosure, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by one or more processors, particularly carries out the steps of the above-mentioned method.
The embodiment of the disclosure provides a processing method and a device for square root function activating instructions and a related product, wherein the device comprises a control module and an operation module, the control module is used for compiling the obtained square root function activating instructions to obtain the compiled square root function activating instructions, analyzing the compiled square root function activating instructions to obtain operation codes and operation domains of the square root function activating instructions, and obtaining data to be operated and a target address required for executing the square root function activating instructions according to the operation codes and the operation domains; the operation module is used for performing activation operation on data to be operated to obtain an operation result, and storing the operation result into a target address. The square root function activating instruction processing method, the square root function activating instruction processing device and the related products provided by the embodiment of the disclosure have the advantages of wide application range, high processing efficiency and high processing speed for the square root function activating instruction.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 shows a block diagram of a square root function enable instruction processing apparatus according to an embodiment of the present disclosure.
FIGS. 2 a-2 f show block diagrams of a square root function enable instruction processing apparatus according to an embodiment of the present disclosure.
FIG. 3 is a diagram illustrating an application scenario of a square root function enabled instruction processing device according to an embodiment of the present disclosure.
Fig. 4a, 4b show block diagrams of a combined processing device according to an embodiment of the present disclosure.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure.
FIG. 6 shows a flowchart of a square root function activate instruction processing method according to an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
FIG. 1 shows a block diagram of a square root function enable instruction processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus includes a control module 11 and an operation module 12. The control module 11 is configured to compile the obtained square root function activation instruction to obtain a compiled square root function activation instruction, analyze the compiled square root function activation instruction to obtain an operation code and an operation domain of the square root function activation instruction, and obtain data to be operated and a target address required for executing the square root function activation instruction according to the operation code and the operation domain. The operation code is used for indicating that the activation operation of the square root function activation instruction on the data is the square root function activation operation, and the operation domain comprises a source address and a target address of the data to be operated. The operation module 12 is configured to perform square root activation operation on the data to be operated to obtain an operation result, and store the operation result in the target address.
It should be clear that the mathematical sqrt () is an open square operation, which represents taking the arithmetic square root of a non-negative real number, where the number in parentheses is the non-negative real number, and Y X, where the value of X is solved in the case of Y. Alternatively, the square-on-square operation may be performed in a binary manner, in one interval, each time the square of the intermediate number of the interval is tested, and if the square of the intermediate number is greater than the non-negative real number, the intermediate number of the left interval is tested again; if the number is smaller, the middle number of the right interval is taken to try again. For example, as a result of finding sqrt (16), you try (0+16)/2 ═ 8, 8 ═ 64, and 64 is larger than 16, and then move to the left, and try (0+8)/2 ═ 4, and 4 ═ 16, and you get the correct result sqrt (16) ═ 4.
The square root operation described above may be a representation of an activation function. The square root function in the embodiment of the present application may refer to a function y ═ sqrt (), and the square root activation operation may refer to a process of performing a square root operation on data to be processed.
In this embodiment, the square root function activating instruction obtained by the control module is an uncompiled software instruction that cannot be directly executed by hardware, and the control module needs to compile the square root function activating instruction (uncompiled) first. In this embodiment, the compiled square root function activation instruction may be a hardware instruction corresponding to the square root function activation instruction. The control module can analyze the compiled square root function activating instruction after obtaining the compiled square root function activating instruction. The compiled square root function enable instruction is a hardware instruction that can be directly executed by hardware. The operation module can directly execute the compiling to obtain a hardware instruction and obtain an operation result of the activation operation.
In this embodiment, the operation code may be a part of an instruction or a field (usually indicated by a code) specified in the computer program to perform an operation, and is an instruction sequence number used to inform a device executing the instruction which instruction needs to be executed specifically. The operation domain may be a source of all data required for executing the corresponding instruction, such as a corresponding address, and all data required for executing the corresponding instruction may include data to be calculated and a corresponding instruction processing method. The square root function activating instruction comprises an operation code and an operation domain, wherein the operation domain at least comprises a source address and a target address of data to be operated. It should be understood that the instruction format of the square root function activate instruction and the contained opcode and operation field may be set as desired by those skilled in the art, and the disclosure is not limited thereto.
Alternatively, the source address of the data to be operated may be a start address of a storage space where the data to be operated is located, and the control module may obtain the instruction and the data through a data input and output unit, which may be one or more data I/O interfaces or I/O pins. Further, the control module can determine the data to be operated according to the source address of the data to be operated and obtain the data to be operated. Of course, in other embodiments, the control module may also determine the data required for performing the square root function activation operation according to the operation code of the square root function activation instruction.
Optionally, the operation domain may also include a read amount or a memory address of the read amount. The control module 11 is further configured to obtain a read amount, and obtain data to be calculated according to the read amount. The data volume of the data to be operated can be smaller than or equal to the read-in volume. In this implementation, the read-in amount may be a data amount of the acquired data to be operated on, and may be a size of the acquired data to be operated on. When the operation field directly contains a specific numerical value of the read amount, the numerical value may be determined as the read amount. When the memory address of the read amount is included in the operation field, the read amount can be acquired from the memory address.
In one possible implementation manner, when the read-in amount is not included in the operation domain, the plurality of data to be operated may be acquired according to a preset default read-in amount. The acquired data amount of the plurality of data to be operated can be smaller than or equal to the default read-in amount.
Through the mode, the data size and the size of the data to be operated can be limited, the accuracy of the operation result is ensured, and the device can execute the square root function activating instruction.
Alternatively, the number of the source address and the destination address may be more than one, and the source address and the destination address are set in one-to-one correspondence. At this time, the operation module can perform square root activation operation on a plurality of data to be operated at the same time. Further optionally, the apparatus may include one or more control modules and one or more operation modules, and the number of the control modules and the number of the operation modules may be set according to actual needs, which is not limited in this disclosure. When the apparatus includes a control module, the control module may receive the square root function activation instruction and control the one or more processing modules to perform the square root function activation operation. When the apparatus includes a plurality of control modules, the plurality of control modules may receive the square root function activation instruction, respectively, and control the corresponding one or more processing modules to perform square root function activation operation. In one possible implementation, the instruction format of the square root function activate instruction may be:
active.sqrt,dst,src0,size
sqrt is the opcode of the square root function activate instruction, dst, src0, size are the operation field of the square root function activate instruction. Where dst is the target address, src0 is the data address to be calculated, and size is the read amount.
In one possible implementation, the instruction format of the square root function activate instruction may be:
active.sqrtdst,src0,src1,size
sqrt is the opcode of the square root function activate instruction, dst, src0, src1, and size are the operation field of the square root function activate instruction. Wherein dst is the target address, src0 is the address of the data to be calculated, src1 is the address of the square root function parameter table, and size is the read-in amount.
Optionally, the input size may be an integer divisible by 64, but in other embodiments, the input size may also be an integer divisible by 2, 4, 8, 16, or 32, which is only for illustration and is not used to limit the specific value range of the input size.
The source address sr0 and the destination address dst both refer to a start address, the source address corresponds to a default address offset, the destination address corresponds to a default address offset, and the default address offset may be a multiple of 64 bytes. Of course, in other embodiments, the default address offset may also be an integer multiple of 8 bytes, 16 bytes, 32 bytes, or 128 bytes, etc., which is only for illustration and is not limited in particular. Specifically, the address offset may be determined according to the operation result.
It should be understood that the location of the opcode, opcode and operand field in the instruction format for the square root function activate instruction may be set by one skilled in the art as desired and is not limited by this disclosure.
The embodiment of the disclosure provides a square root function activation instruction processing device, which comprises a control module and an operation module, wherein the control module is used for compiling an obtained square root function activation instruction to obtain a compiled square root function activation instruction, analyzing the compiled square root function activation instruction to obtain an operation code and an operation domain of the square root function activation instruction, acquiring data to be operated and a target address required by executing the square root function activation instruction according to the operation code and the operation domain, and acquiring a square root activation function parameter table; the operation module is used for performing activation operation on data to be operated according to the square root activation function parameter table to obtain an operation result, and storing the operation result into a target address. The square root function activation instruction processing device provided by the embodiment of the disclosure has a wide application range, and has high processing efficiency and high processing speed for the square root function activation instruction.
Alternatively, the compiled square root function invocation instruction may be a binary instruction that the arithmetic module is capable of executing. The control module 11 may receive the binary instruction obtained after the compiling, and perform analysis operations such as decoding on the hardware instruction, so as to obtain a hardware instruction that can be executed by at least one processing module 12. The operation module 12 may perform a square root activation operation according to the parsed square root function activation instruction. Further, the control module may translate the square root function enable instruction into an intermediate code instruction, and assemble the intermediate code instruction to obtain a binary instruction that can be executed by the machine, where the compiled square root function enable instruction may be referred to as a binary instruction.
In a possible implementation, the control module 11 may be further configured to obtain the square root activation function parameter table according to the operation code and/or the operation domain. The operation module 12 may also be configured to perform square root function activation operation on the data to be operated according to the square root activation function parameter table, so as to obtain an operation result. Wherein, the square root activation function parameter table may include a square root activation function activation table and/or a square root activation function constant table.
The square root function activation table may include a plurality of different square root activation functions, for example, the square root function activation table may include a square root activation function y ═ sqrt (x) and a square root activation function y √ x, where x represents a non-negative real number. At this time, the operation module may select a corresponding square root activation function according to the square root activation table, and perform activation operation according to the corresponding square root activation function to obtain an operation result. The square root activation function constant table may include a plurality of different operation result values for the square root activation function. At the moment, the operation module can obtain a corresponding operation result through table lookup, so that activation operation is not needed, and the operation efficiency is improved.
Alternatively, the square root activation function parameter table may include a square root activation function parameter table and a square root activation function constant table. The square root activation function parameter table may be: the corresponding operation result of the activation function y-sqrt (x) is a; the activation function y √ x corresponds to an operation result B, and so on. At this time, the operation module performs square root activation operation according to the activation function parameter table to obtain an operation result. In the embodiment of the application, no matter whether the square root activation function constant table contains the corresponding operation result, the operation module can obtain the corresponding operation result in an adaptive manner, so that the reliability and the practicability of the operation module are improved.
Optionally, the operation field may include a square root activation function parameter table address, so that the control module obtains the square root activation function parameter table address from the square root activation function parameter table address. Or, the control module may directly obtain the square root activation function parameter table from a predetermined storage address of the square root activation function parameter table when determining that the square root activation function parameter table is needed for executing the square root function activation instruction according to the operation code. The person skilled in the art can set the obtaining manner of the square root activation function parameter table according to actual needs, which is not limited by the present disclosure.
FIG. 2a shows a block diagram of a square root function enable instruction processing apparatus according to an embodiment of the present disclosure. The instruction processing device can also comprise a storage module, and the storage module can be used for storing data to be operated and operation results. Optionally, the initial storage space pointed to by the source address of the data to be operated and the target storage space pointed to by the target address are both storage spaces in the storage module. Further, the initial storage space and the target storage space may be multiplexed, that is, the target storage space of the operation result may be the initial storage space of the data to be operated. Thus, the utilization rate of the storage module can be improved through the multiplexing of the address space.
Optionally, in this implementation, the storage module may include one or more of an on-chip storage, a cache, and a register, and the cache may include a scratch pad cache. The application may store the data to be computed and the parameter table of the square root activation function in the memory, cache and/or register of the storage module as needed, which is not limited by the present disclosure. Alternatively, the initial and target memory spaces described above may point to on-chip storage, which may be on-chip NRAMs, for storing tensor data or scalar data.
As shown in fig. 2a, the operator module 12 may include at least one activation operator 120. The activation operator 120 is configured to perform square root activation operation on data to be operated to obtain an operation result. The number of the active calculators can be set according to the data size of the active calculation required, the processing speed and efficiency of the active calculation and other requirements, and the disclosure does not limit the number. Furthermore, the operation module may further include a data access circuit, the data access circuit may obtain data to be operated from the storage module, and the data access circuit may further store an operation result in the storage module. Alternatively, the data access circuit may be a direct memory access module.
FIG. 2b shows a block diagram of a square root function enable instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2b, the operation module 12 may include a master operation sub-module 121 and a plurality of slave operation sub-modules 122 connected to the master operation sub-module 121. In the embodiment of the present application, the master operation sub-module and the plurality of slave operation sub-modules may each include the above-mentioned activation operator.
Specifically, the control module is further configured to analyze the compiled square root function activation instruction to obtain a plurality of operation instructions, and send the data to be operated and the plurality of operation instructions to the main operation sub-module. The main operation sub-module is used for executing preorder processing on the data to be operated and sending the operation instruction and at least one part of the data to be operated to the auxiliary operation sub-module; the activation operator of the main operation sub-module is capable of performing the square root function operation to obtain an intermediate result. Specifically, the master operation sub-module may divide the data to be operated into a plurality of sub-data, and send a part of the sub-data to the slave operation sub-module for operation. Meanwhile, the main arithmetic sub-module activating arithmetic device can also carry out activating arithmetic on the residual sub-data to obtain an intermediate result.
The activation arithmetic unit of the slave arithmetic sub-module is used for executing the square root function activation arithmetic in parallel according to the data and the arithmetic instruction received from the master arithmetic sub-module to obtain a plurality of intermediate results and transmitting the plurality of intermediate results to the master arithmetic sub-module; and the main operation sub-module is also used for executing subsequent processing on the plurality of intermediate results to obtain operation results and storing the operation results into the target address. Specifically, the main operation result may obtain an operation result according to the plurality of intermediate results, and store the operation result into the storage space corresponding to the target address. According to the embodiment of the application, the calculation efficiency can be improved by adopting a mode of cooperative calculation of the main calculation submodule and the slave calculation submodule.
Further alternatively, the instruction processing apparatus may also process other instructions than the activation function, and the other instructions may be instructions that perform arithmetic operations, logical operations, and other operations on data such as scalars, vectors, matrices, tensors, and the like, different from the square root function activation instruction, for example, scalar calculation instructions, convolution calculation instructions, and the like, and those skilled in the art may set the calculation instructions according to actual needs, which is not limited by the present disclosure. The calculation instruction obtained by the control module is an uncompiled software instruction which cannot be directly executed by hardware, and the control module needs to compile the calculation instruction (uncompiled) first. The compiled computing instructions are hardware instructions that can be directly executed by hardware. In this implementation manner, the control module is further configured to analyze the compiled calculation instruction to obtain an operation code and an operation domain of the calculation instruction, and obtain data to be calculated according to the operation code and the operation domain.
Furthermore, to implement the above-mentioned operations, the operation module may further include an adder, a divider, a multiplier, a comparator, and other operators capable of performing arithmetic operations, logical operations, and other operations on data. The present application can set the type and number of the arithmetic units according to the requirements of the size of the data amount of the arithmetic operation to be performed, the arithmetic type, the processing speed and efficiency of the arithmetic operation on the data, and the like, which is not limited by the present disclosure.
In a possible implementation manner, the control module 11 is further configured to analyze the calculation instruction to obtain a plurality of operation instructions, and send the data to be operated and the plurality of operation instructions to the main operation sub-module 121. Optionally, the activation arithmetic unit in the main arithmetic sub-module may be only used to implement the activation arithmetic. For example, when the calculation instruction is an operation performed on scalar or vector data, the apparatus may control the main operation submodule to perform an operation corresponding to the calculation instruction by using an operator therein. Optionally, when the calculation instruction is to perform calculation on data with a dimensionality greater than or equal to 2, such as a matrix, a tensor, and the like, the device may implement the activation calculation in a manner of cooperation between the master calculation submodule and the slave calculation submodule, and the specific calculation process may refer to the description above.
It should be noted that, a person skilled in the art may set the connection manner between the master operation submodule and the plurality of slave operation submodules according to actual needs to implement the configuration setting of the operation module, for example, the configuration of the operation module may be an "H" configuration, an array configuration, a tree configuration, and the like, which is not limited in the present disclosure.
FIG. 2c shows a block diagram of a square root function enable instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2c, the operation module 12 may further include one or more branch operation sub-modules 123, and the branch operation sub-module 123 is configured to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122. The main operation sub-module 121 is connected to one or more branch operation sub-modules 123. Therefore, the main operation sub-module, the branch operation sub-module and the slave operation sub-module in the operation module are connected by adopting an H-shaped structure, and data and/or operation instructions are forwarded by the branch operation sub-module, so that the resource occupation of the main operation sub-module is saved, and the instruction processing speed is further improved.
FIG. 2d shows a block diagram of a square root function enable instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in FIG. 2d, a plurality of slave operation sub-modules 122 are distributed in an array.
Each slave operation submodule 122 is connected to another adjacent slave operation submodule 122, the master operation submodule 121 is connected to k slave operation submodules 122 of the plurality of slave operation submodules 122, and the k slave operation submodules 122 are: n slave operator sub-modules 122 of row 1, n slave operator sub-modules 122 of row m, and m slave operator sub-modules 122 of column 1.
As shown in fig. 2d, the k slave operator modules include only the n slave operator modules in the 1 st row, the n slave operator modules in the m th row, and the m slave operator modules in the 1 st column, that is, the k slave operator modules are slave operator modules directly connected to the master operator module among the plurality of slave operator modules. The k slave operation submodules are used for forwarding data and instructions between the master operation submodules and the plurality of slave operation submodules. Therefore, the plurality of slave operation sub-modules are distributed in an array, the speed of sending data and/or operation instructions to the slave operation sub-modules by the master operation sub-module can be increased, and the instruction processing speed is further increased.
FIG. 2e shows a block diagram of a square root function enable instruction processing apparatus according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2e, the operation module may further include a tree sub-module 124. The tree submodule 124 includes a root port 401 and a plurality of branch ports 402. The root port 401 is connected to the master operation submodule 121, and the plurality of branch ports 402 are connected to the plurality of slave operation submodules 122, respectively. The tree sub-module 124 has a transceiving function, and is configured to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122. Therefore, the operation modules are connected in a tree-shaped structure under the action of the tree-shaped sub-modules, and the speed of sending data and/or operation instructions from the main operation sub-module to the auxiliary operation sub-module can be increased by utilizing the forwarding function of the tree-shaped sub-modules, so that the instruction processing speed is increased.
In one possible implementation, the tree submodule 124 may be an optional result of the apparatus, which may include at least one level of nodes. The nodes are line structures with forwarding functions, and the nodes do not have operation functions. The lowest level node is connected to the slave operation sub-module to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122. In particular, if the tree submodule has zero level nodes, the apparatus does not require the tree submodule.
In one possible implementation, the tree submodule 124 may include a plurality of nodes of an n-ary tree structure, and the plurality of nodes of the n-ary tree structure may have a plurality of layers.
For example, FIG. 2f shows a block diagram of a square root function enable instruction processing device, according to an embodiment of the disclosure. As shown in FIG. 2f, the n-ary tree structure may be a binary tree structure with tree-type sub-modules including 2 levels of nodes 01. The lowest level node 01 is connected with the slave operation sub-module 122 to forward data and/or operation instructions between the master operation sub-module 121 and the slave operation sub-module 122.
In this implementation, the n-ary tree structure may also be a ternary tree structure or the like, where n is a positive integer greater than or equal to 2. The number of n in the n-ary tree structure and the number of layers of nodes in the n-ary tree structure may be set by those skilled in the art as needed, and the disclosure is not limited thereto.
In one possible implementation, as shown in fig. 2 a-2 f, the control module 11 may include an instruction storage sub-module 111, an instruction processing sub-module 112, and a queue storage sub-module 113. The instruction storage submodule 111 is configured to store a square root function activation instruction after line compilation. The instruction processing sub-module 112 is configured to parse the compiled square root function activation instruction to obtain an operation code and an operation domain of the square root function activation instruction. The queue storage submodule 113 is configured to store an instruction queue, where the instruction queue includes a plurality of compiled square root function activation instructions that are sequentially arranged according to an execution order.
In this implementation, the instruction queue may be obtained by arranging the execution order of the compiled square root function activation instructions according to the receiving time, the priority level, and the like of the compiled square root function activation instructions, so as to sequentially execute the compiled square root function activation instructions according to the instruction queue.
In one possible implementation, as shown in fig. 2 a-2 f, the control module 11 may further include a dependency processing sub-module 114. The dependency relationship processing sub-module 114 is configured to, when it is determined that a first compiled square root function activation instruction in the plurality of compiled square root function activation instructions has an association relationship with a zeroth compiled square root function activation instruction before the first compiled square root function activation instruction, cache the first compiled square root function activation instruction in the instruction storage sub-module 111, and after the zeroth compiled square root function activation instruction is executed, extract the first compiled square root function activation instruction from the instruction storage sub-module 111 and send the first compiled square root function activation instruction to the operation module 12. Wherein the compiled first square root function enable instruction and the compiled zeroth square root function enable instruction are instructions of a plurality of compiled square root function enable instructions.
Wherein the step of associating the compiled first square root function activation instruction with the compiled zeroth square root function activation instruction before the compiled first square root function activation instruction comprises the steps of: the first storage address interval for storing the data required by the compiled first square root function activating instruction and the zeroth storage address interval for storing the data required by the compiled zeroth square root function activating instruction have an overlapping area. On the contrary, the fact that there is no correlation between the compiled first square root function enable instruction and the compiled zeroth square root function enable instruction may be that there is no overlapping area between the first storage address interval and the zeroth storage address interval.
By the method, the compiled square root function activating instruction can be executed after the execution of the first compiled square root function activating instruction is finished according to the dependency relationship between the compiled square root function activating instructions, so that the accuracy of the operation result is ensured.
In one possible implementation manner, the apparatus may be disposed in one or more of a Graphics Processing Unit (GPU), a Central Processing Unit (CPU), and an embedded Neural Network Processor (NPU).
It should be noted that, although the square root function activating instruction processing apparatus has been described above by taking the above-described embodiment as an example, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each module according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
Application example
An application example according to the embodiment of the present disclosure is given below in conjunction with "activating an instruction processing apparatus to perform an activation operation using a square root function" as an exemplary application scenario to facilitate understanding of a flow of activating the instruction processing apparatus using the square root function. It is understood by those skilled in the art that the following application examples are merely for the purpose of facilitating understanding of the embodiments of the present disclosure and should not be construed as limiting the embodiments of the present disclosure
FIG. 3 is a diagram illustrating an application scenario of a square root function enabled instruction processing device according to an embodiment of the present disclosure. As shown in fig. 3, the square root function activating instruction processing device processes the square root function activating instruction as follows:
as shown in fig. 3, the control module 11 compiles the obtained square root function activation instruction 1 to obtain a compiled square root function activation instruction 1 (e.g. the square root function activation instruction 1 is @ active. sqrt 50010064), and analyzes the compiled square root function activation instruction 1 to obtain an operation code and an operation domain of the square root function activation instruction 1. The operation code of the square root function activation instruction 1 is active.sqrt, the target address is 500, the source address of the data to be operated is 100, and the read-in amount is 64. The control module 11 obtains data to be operated with a data amount of 64 (read amount) from the data address 100 to be operated. The operation module 12 may perform a square root operation on the data to be operated to obtain an operation result, and store the operation result in the storage space corresponding to the target address 500. In this way, the square root function activating instruction processing device can process the square root function activating instruction efficiently and quickly.
In the embodiments provided in the present disclosure, it should be understood that the disclosed system and apparatus may be implemented in other ways. For example, the above-described embodiments of systems and apparatuses are merely illustrative, and for example, a division of a device, an apparatus, and a module is merely a logical division, and an actual implementation may have another division, for example, a plurality of modules may be combined or integrated into another system or apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices, apparatuses or modules, and may be an electrical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present disclosure may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware or a form of a software program module.
The integrated modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The present disclosure provides a machine learning arithmetic device, which may include one or more of the above square root function activation instruction processing devices, and is configured to acquire data to be operated and control information from other processing devices, and execute a specified machine learning operation. The machine learning arithmetic device can obtain the square root function activating instruction from other machine learning arithmetic devices or non-machine learning arithmetic devices, and transmit the execution result to peripheral equipment (also called other processing devices) through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one square root function enable command processing device is included, the square root function enable command processing devices can be linked and transmit data through a specific structure, for example, a PCIE bus is used for interconnection and data transmission, so as to support larger-scale operation of the neural network. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
Fig. 4a shows a block diagram of a combined processing device according to an embodiment of the present disclosure. As shown in fig. 4a, the combined processing device includes the machine learning arithmetic device, the universal interconnection interface, and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device acquires required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Fig. 4b shows a block diagram of a combined processing device according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 4b, the combined processing device may further include a storage device, and the storage device is connected to the machine learning operation device and the other processing device respectively. The storage device is used for storing data stored in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
The present disclosure provides a machine learning chip, which includes the above machine learning arithmetic device or combined processing device.
The present disclosure provides a machine learning chip package structure, which includes the above machine learning chip.
Fig. 5 shows a schematic structural diagram of a board card according to an embodiment of the present disclosure. As shown in fig. 5, the board includes the above-mentioned machine learning chip package structure or the above-mentioned machine learning chip. The board may include, in addition to the machine learning chip 389, other kits including, but not limited to: memory device 390, interface device 391 and control device 392.
The memory device 390 is coupled to a machine learning chip 389 (or a machine learning chip within a machine learning chip package structure) via a bus for storing data. Memory device 390 may include multiple sets of memory cells 393. Each group of memory cells 393 is coupled to a machine learning chip 389 via a bus. It is understood that each group 393 may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM.
In one embodiment, memory device 390 may include 4 groups of memory cells 393. Each group of memory cells 393 may include a plurality of DDR4 particles (chips). In one embodiment, the machine learning chip 389 may include 4 72-bit DDR4 controllers therein, where 64bit is used for data transmission and 8bit is used for ECC check in the 72-bit DDR4 controller. It is appreciated that when DDR4-3200 particles are used in each group of memory cells 393, the theoretical bandwidth of data transfer may reach 25600 MB/s.
In one embodiment, each group 393 of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the machine learning chip 389 for controlling data transfer and data storage of each memory unit 393.
Interface device 391 is electrically coupled to machine learning chip 389 (or a machine learning chip within a machine learning chip package). The interface device 391 is used to implement data transmission between the machine learning chip 389 and an external device (e.g., a server or a computer). For example, in one embodiment, the interface device 391 may be a standard PCIE interface. For example, the data to be processed is transmitted to the machine learning chip 289 by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device 391 may also be another interface, and the disclosure does not limit the specific representation of the other interface, and the interface device can implement the switching function. In addition, the calculation result of the machine learning chip is still transmitted back to the external device (e.g., server) by the interface device.
The control device 392 is electrically connected to a machine learning chip 389. The control device 392 is used to monitor the state of the machine learning chip 389. Specifically, the machine learning chip 389 and the control device 392 may be electrically connected through an SPI interface. The control device 392 may include a single chip Microcomputer (MCU). For example, machine learning chip 389 may include multiple processing chips, multiple processing cores, or multiple processing circuits, which may carry multiple loads. Therefore, the machine learning chip 389 can be in different operation states such as a multi-load and a light load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the machine learning chip.
The present disclosure provides an electronic device, which includes the above machine learning chip or board card.
The electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle may include an aircraft, a ship, and/or a vehicle. The household appliances may include televisions, air conditioners, microwave ovens, refrigerators, electric rice cookers, humidifiers, washing machines, electric lamps, gas cookers, and range hoods. The medical device may include a nuclear magnetic resonance apparatus, a B-mode ultrasound apparatus and/or an electrocardiograph.
FIG. 6 shows a flowchart of a square root function activate instruction processing method according to an embodiment of the disclosure. As shown in fig. 6, the method is applied to the square root function activation instruction processing apparatus described above, and includes step S51 and step S52.
In step S51, the obtained square root function activating instruction is compiled to obtain a compiled square root function activating instruction, the compiled square root function activating instruction is analyzed to obtain an operation code and an operation domain of the square root function activating instruction, and the data to be operated and the target address required for executing the square root function activating instruction are obtained according to the operation code and the operation domain, and the square root activating function parameter table is obtained. The operation code is used for indicating that the activation operation of the square root function activation instruction on the data is the square root function activation operation, and the operation domain comprises a data address to be operated and a target address.
In step S52, the operation module performs square root function activation operation on the data to be operated to obtain an operation result, and stores the operation result in the storage space corresponding to the target address.
Optionally, the method further comprises:
acquiring a square root activation function parameter table according to the operation code and/or the operation domain;
performing square root function activation operation on the data to be operated according to the square root activation function parameter table to obtain an operation result;
wherein the square root function parameter table comprises a square root activation function active table and/or a square root activation function constant table.
Optionally, the step S52 specifically includes:
and carrying out square root function activation operation on the data to be operated by utilizing the activation operator.
Optionally, the method is used in an instruction processing apparatus including an arithmetic module, where the arithmetic module includes a master arithmetic sub-module and a plurality of slave arithmetic sub-modules connected to the master arithmetic sub-module, and the master arithmetic sub-module and the plurality of slave arithmetic sub-modules each include the activation operator; the step S52 may specifically include:
the control module analyzes the compiled square root function activating instruction to obtain a plurality of operation instructions and sends the data to be operated and the operation instructions to the main operation submodule;
the main operation submodule executes preorder processing on the data to be operated and sends the operation instruction and at least one part of the data to be operated to the slave operation submodule; the activation arithmetic unit of the main arithmetic sub-module can execute the square root function operation to obtain an intermediate result;
the activation arithmetic unit of the slave arithmetic submodule executes the square root function activation arithmetic in parallel according to the data and the arithmetic instruction received from the main arithmetic submodule to obtain a plurality of intermediate results, and transmits the plurality of intermediate results to the main arithmetic submodule;
and the main operation sub-module executes subsequent processing on the plurality of intermediate results to obtain operation results, and stores the operation results into the target address.
Optionally, the operation domain further includes a read-in amount or a storage address of the read-in amount; the method further comprises the following steps:
and acquiring the read-in quantity, and acquiring the data to be operated according to the read-in quantity and the source address of the data to be operated.
Optionally, the initial storage space corresponding to the source address of the data to be operated and the target storage space corresponding to the target address of the data to be operated are storage spaces stored on the chip.
Optionally, the step S51 may include:
storing the compiled square root function activation instruction;
analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction;
and storing an instruction queue, wherein the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the compiled square root function activating instruction.
It should be noted that, although the square root function activation instruction processing method is described above by taking the above-mentioned embodiment as an example, those skilled in the art can understand that the disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The execution process of each step in the above method is substantially the same as the working process of the instruction processing apparatus, and reference may be specifically made to the above description, which is not described herein again.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure. Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The present application also provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by the instruction processing device, the steps in the method are implemented. Specifically, when executed by the instruction processing apparatus, the computer program implements the following steps:
compiling the obtained square root function activating instruction to obtain a compiled square root function activating instruction;
analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction, and acquiring data to be operated and a target address required by executing the square root function activating instruction according to the operation code and the operation domain;
performing square root activation operation on the data to be operated to obtain an operation result, and storing the operation result into the target address;
the operation code is used for indicating that the activation operation of the square root function activation instruction on the data is a square root function activation operation, and the operation domain comprises a source address and a target address of the data to be operated.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The execution process of each step in the above method is substantially the same as the working process of the instruction processing apparatus, and reference may be specifically made to the above description, which is not described herein again.
The foregoing may be better understood in light of the following clauses:
clause 1: a square root function enabled instruction processing device, the device comprising:
the control module is used for compiling the obtained square root function activating instruction to obtain a compiled square root function activating instruction, analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction, and obtaining data to be operated and a target address required by executing the square root function activating instruction according to the operation code and the operation domain;
the operation module is used for carrying out square root activation operation on the data to be operated to obtain an operation result and storing the operation result into the target address;
the operation code is used for indicating that the activation operation of the square root function activation instruction on the data is a square root function activation operation, and the operation domain comprises a source address and a target address of the data to be operated.
Clause 2: the apparatus according to clause 1, wherein the control module is further configured to obtain a square root activation function parameter table according to the opcode and/or the operation domain;
the operation module is also used for carrying out square root function activation operation on the data to be operated according to the square root activation function parameter table to obtain an operation result;
wherein the square root function parameter table comprises a square root activation function active table and/or a square root activation function constant table.
Clause 3: the apparatus of clause 1 or 2, wherein the calculation module comprises an activation calculator; the activation arithmetic unit is used for carrying out square root function activation operation on the data to be operated.
Clause 4: the apparatus according to any one of clauses 1-3, wherein the arithmetic module comprises a master arithmetic sub-module and a plurality of slave arithmetic sub-modules connected with the master arithmetic sub-module, and the master arithmetic sub-module and the plurality of slave arithmetic sub-modules each comprise the activation operator;
the control module is further configured to analyze the compiled square root function activation instruction to obtain a plurality of operation instructions, and send the data to be operated and the plurality of operation instructions to the main operation sub-module;
the main operation sub-module is used for executing preorder processing on the data to be operated and sending the operation instruction and at least one part of the data to be operated to the auxiliary operation sub-module; the activation arithmetic unit of the main arithmetic sub-module can execute the square root function operation to obtain an intermediate result;
the activation arithmetic unit of the slave arithmetic sub-module is used for executing the square root function activation arithmetic in parallel according to the data and the arithmetic instruction received from the master arithmetic sub-module to obtain a plurality of intermediate results and transmitting the plurality of intermediate results to the master arithmetic sub-module;
and the main operation sub-module is also used for executing subsequent processing on the plurality of intermediate results to obtain operation results and storing the operation results into the target address.
Clause 5: the apparatus according to any of clauses 1-4, wherein the operation domain further comprises a read-in amount or a storage address of the read-in amount;
the control module is further configured to obtain the read amount, and obtain the data to be operated according to the read amount and the source address of the data to be operated.
Clause 6: the apparatus of any of clauses 1-5, further comprising:
and the storage module is used for storing the data to be operated and the operation result.
Clause 7: the apparatus according to any one of clauses 1 to 6, wherein the storage module includes an on-chip storage, and an initial storage space corresponding to a source address of the data to be operated and a target storage space corresponding to a target address of the data to be operated are storage spaces stored on the chip.
Clause 8: the apparatus of any of clauses 1-7, wherein the control module comprises:
the instruction storage submodule is used for storing the compiled square root function activating instruction;
the instruction processing submodule is used for analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction;
and the queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the compiled square root function activating instruction.
Clause 9: a square root function enabled instruction processing method, the method comprising:
compiling the obtained square root function activating instruction to obtain a compiled square root function activating instruction;
analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction, and acquiring data to be operated and a target address required by executing the square root function activating instruction according to the operation code and the operation domain;
performing square root activation operation on the data to be operated to obtain an operation result, and storing the operation result into the target address;
the operation code is used for indicating that the activation operation of the square root function activation instruction on the data is a square root function activation operation, and the operation domain comprises a source address and a target address of the data to be operated.
Clause 10: the method of clause 9, wherein the control module is further configured to obtain a square root activation function parameter table according to the opcode and/or the operation domain;
the operation module is also used for carrying out square root function activation operation on the data to be operated according to the square root activation function parameter table to obtain an operation result;
wherein the square root function parameter table comprises a square root activation function active table and/or a square root activation function constant table.
Clause 11: the method of clause 9 or 10, wherein the calculation module comprises an activation calculator; the activation arithmetic unit is used for carrying out square root function activation operation on the data to be operated.
Clause 12: the method according to any one of clauses 9-11, wherein the calculation module comprises a master calculation sub-module and a plurality of slave calculation sub-modules connected with the master calculation sub-module, and the master calculation sub-module and the plurality of slave calculation sub-modules each comprise the activation calculator;
the control module is further used for analyzing the square root function activating instruction to obtain a plurality of operation instructions and sending the data to be operated and the operation instructions to the main operation submodule;
the main operation sub-module is used for executing preorder processing on the data to be operated and sending the operation instruction and at least one part of the data to be operated to the auxiliary operation sub-module; the activation arithmetic unit of the main arithmetic sub-module can execute the square root function operation to obtain an intermediate result;
the activation arithmetic unit of the slave arithmetic sub-module is used for executing the square root function activation arithmetic in parallel according to the data and the arithmetic instruction received from the master arithmetic sub-module to obtain a plurality of intermediate results and transmitting the plurality of intermediate results to the master arithmetic sub-module;
and the main operation sub-module is also used for executing subsequent processing on the plurality of intermediate results to obtain operation results and storing the operation results into the target address.
Clause 13: the method according to any of clauses 9-12, wherein the operation domain further comprises a read-in amount or a storage address of the read-in amount;
the control module is further configured to obtain the read amount, and obtain the data to be operated according to the read amount and the source address of the data to be operated.
Clause 14: the method of any of clauses 9-13, further comprising:
and the storage module is used for storing the data to be operated and the operation result.
Clause 15: the method according to any one of clauses 9 to 14, wherein the storage module includes on-chip storage, and an initial storage space corresponding to a source address of the data to be operated and a target storage space corresponding to a target address of the data to be operated are storage spaces stored on the chip.
Clause 16: the method of any of clauses 9-15, wherein the control module comprises:
the instruction storage submodule is used for storing the square root function activating instruction;
the instruction processing submodule is used for analyzing the square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction;
and the queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the square root function activating instruction.
Clause 17: a computer-readable storage medium, in which a computer program is stored, which, when being executed by one or more processing means, carries out the steps of the method according to claims 9-16.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A square root function enabled instruction processing apparatus, said apparatus comprising:
the control module is used for compiling the obtained square root function activating instruction to obtain a compiled square root function activating instruction, analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction, and obtaining data to be operated and a target address required by executing the square root function activating instruction according to the operation code and the operation domain;
the operation module is used for carrying out square root activation operation on the data to be operated to obtain an operation result and storing the operation result into the target address;
the operation code is used for indicating that the activation operation of the square root function activation instruction on the data is a square root function activation operation, and the operation domain comprises a source address and a target address of the data to be operated.
2. The apparatus of claim 1,
the control module is also used for acquiring a square root activation function parameter table according to the operation code and/or the operation domain;
the operation module is also used for carrying out square root function activation operation on the data to be operated according to the square root activation function parameter table to obtain an operation result;
wherein the square root function parameter table comprises a square root activation function active table and/or a square root activation function constant table.
3. The apparatus of claim 1, wherein the arithmetic module comprises an activation operator;
the activation arithmetic unit is used for carrying out square root function activation operation on the data to be operated.
4. The apparatus of claim 3, wherein the arithmetic module comprises a master arithmetic sub-module and a plurality of slave arithmetic sub-modules connected to the master arithmetic sub-module, the master arithmetic sub-module and the plurality of slave arithmetic sub-modules each comprising the activation operator;
the control module is further configured to analyze the compiled square root function activation instruction to obtain a plurality of operation instructions, and send the data to be operated and the plurality of operation instructions to the main operation sub-module;
the main operation sub-module is used for executing preorder processing on the data to be operated and sending the operation instruction and at least one part of the data to be operated to the auxiliary operation sub-module; the activation arithmetic unit of the main arithmetic sub-module can execute the square root function operation to obtain an intermediate result;
the activation arithmetic unit of the slave arithmetic sub-module is used for executing the square root function activation arithmetic in parallel according to the data and the arithmetic instruction received from the master arithmetic sub-module to obtain a plurality of intermediate results and transmitting the plurality of intermediate results to the master arithmetic sub-module;
and the main operation sub-module is also used for executing subsequent processing on the plurality of intermediate results to obtain operation results and storing the operation results into the target address.
5. The apparatus of claim 1, wherein the operation domain further comprises a read-in amount or a storage address of the read-in amount;
the control module is further configured to obtain the read amount, and obtain the data to be operated according to the read amount and the source address of the data to be operated.
6. The apparatus of claim 1, further comprising:
and the storage module is used for storing the data to be operated and the operation result.
7. The apparatus according to claim 6, wherein the storage module includes an on-chip storage, and an initial storage space corresponding to a source address of the data to be operated and a target storage space corresponding to a target address of the data to be operated are storage spaces of the on-chip storage.
8. The apparatus of claim 1, wherein the control module comprises:
the instruction storage submodule is used for storing the compiled square root function activating instruction;
the instruction processing submodule is used for analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction;
and the queue storage submodule is used for storing an instruction queue, the instruction queue comprises a plurality of instructions to be executed which are sequentially arranged according to an execution sequence, and the plurality of instructions to be executed comprise the compiled square root function activating instruction.
9. A square root function enable instruction processing method, the method comprising:
compiling the obtained square root function activating instruction to obtain a compiled square root function activating instruction;
analyzing the compiled square root function activating instruction to obtain an operation code and an operation domain of the square root function activating instruction, and acquiring data to be operated and a target address required by executing the square root function activating instruction according to the operation code and the operation domain;
performing square root activation operation on the data to be operated to obtain an operation result, and storing the operation result into the target address;
the operation code is used for indicating that the activation operation of the square root function activation instruction on the data is a square root function activation operation, and the operation domain comprises a source address and a target address of the data to be operated.
10. A computer-readable storage medium, storing a computer program which, when executed by one or more processing devices, performs the steps of the method recited in claim 9.
CN201910724830.1A 2019-08-07 2019-08-07 Instruction processing method and device and related product Pending CN112346707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910724830.1A CN112346707A (en) 2019-08-07 2019-08-07 Instruction processing method and device and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910724830.1A CN112346707A (en) 2019-08-07 2019-08-07 Instruction processing method and device and related product

Publications (1)

Publication Number Publication Date
CN112346707A true CN112346707A (en) 2021-02-09

Family

ID=74366528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910724830.1A Pending CN112346707A (en) 2019-08-07 2019-08-07 Instruction processing method and device and related product

Country Status (1)

Country Link
CN (1) CN112346707A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992329A (en) * 2017-07-20 2018-05-04 上海寒武纪信息科技有限公司 A kind of computational methods and Related product
CN109711539A (en) * 2018-12-17 2019-05-03 北京中科寒武纪科技有限公司 Operation method, device and Related product
CN110096309A (en) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992329A (en) * 2017-07-20 2018-05-04 上海寒武纪信息科技有限公司 A kind of computational methods and Related product
CN110096309A (en) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 Operation method, device, computer equipment and storage medium
CN109711539A (en) * 2018-12-17 2019-05-03 北京中科寒武纪科技有限公司 Operation method, device and Related product

Similar Documents

Publication Publication Date Title
CN110096309B (en) Operation method, operation device, computer equipment and storage medium
CN110119807B (en) Operation method, operation device, computer equipment and storage medium
CN111079909B (en) Operation method, system and related product
CN111047005A (en) Operation method, operation device, computer equipment and storage medium
CN111949317B (en) Instruction processing method and device and related product
CN111078291B (en) Operation method, system and related product
CN111353595A (en) Operation method, device and related product
CN112346707A (en) Instruction processing method and device and related product
CN112346781A (en) Instruction processing method and device and related product
CN112396186B (en) Execution method, execution device and related product
CN111047030A (en) Operation method, operation device, computer equipment and storage medium
CN111966403A (en) Instruction processing method and device and related product
CN111966325A (en) Instruction processing method and device and related product
CN111026440B (en) Operation method, operation device, computer equipment and storage medium
CN111338694B (en) Operation method, device, computer equipment and storage medium
CN111339060B (en) Operation method, device, computer equipment and storage medium
CN111222633A (en) Operation method, device and related product
CN112346784A (en) Instruction processing method and device and related product
CN111045729A (en) Operation method, device and related product
CN111290789B (en) Operation method, operation device, computer equipment and storage medium
CN111275197B (en) Operation method, device, computer equipment and storage medium
CN111047028A (en) Operation method, device and related product
CN111078283B (en) Operation method, device and related product
CN111047027A (en) Operation method, device and related product
CN112346705A (en) Instruction processing method and device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination