CN116662509A - Open domain question-answering implementation method, device and equipment of large-scale language model - Google Patents
Open domain question-answering implementation method, device and equipment of large-scale language model Download PDFInfo
- Publication number
- CN116662509A CN116662509A CN202310688299.3A CN202310688299A CN116662509A CN 116662509 A CN116662509 A CN 116662509A CN 202310688299 A CN202310688299 A CN 202310688299A CN 116662509 A CN116662509 A CN 116662509A
- Authority
- CN
- China
- Prior art keywords
- segmentation
- parallel
- strategy
- target
- operator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 230000011218 segmentation Effects 0.000 claims abstract description 131
- 238000012549 training Methods 0.000 claims abstract description 61
- 238000004891 communication Methods 0.000 claims abstract description 58
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 26
- 230000015654 memory Effects 0.000 claims description 31
- 230000006870 function Effects 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 16
- 238000010845 search algorithm Methods 0.000 claims description 14
- 230000009467 reduction Effects 0.000 claims description 7
- 230000010076 replication Effects 0.000 claims description 6
- 230000002068 genetic effect Effects 0.000 claims description 5
- 238000013135 deep learning Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000013136 deep learning model Methods 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000010429 evolutionary process Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of deep learning, and discloses a method, a device and equipment for realizing open domain question and answer of a large-scale language model. The method comprises the following steps: acquiring a plurality of segmentation operators, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy; acquiring a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy; acquiring a target parallel strategy from a plurality of candidate parallel strategies through a preset strategy searching algorithm; and carrying out distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model, and adopting the target language model to realize open domain question-answering. According to the technical scheme, the candidate parallel strategy is automatically generated and the optimal parallel strategy is selected, so that training efficiency of a large-scale language model can be improved, and the realization efficiency of intelligent question-answering can be improved.
Description
Technical Field
The invention relates to the technical field of deep learning, in particular to a method, a device and equipment for realizing open domain question and answer of a large-scale language model.
Background
Along with the gradual complicating of large-scale language models, single-machine single-card training can not meet the training requirement of large-scale models, so that multi-machine multi-card parallel training becomes a research hot spot in the current deep learning field. In multi-machine multi-card parallel training, research on how to effectively utilize heterogeneous many-core devices and automated parallel strategies has received much attention.
Currently, existing parallel training methods typically utilize existing parallel training frameworks, such as TensorFlow, pyTorch, to achieve parallel training of large-scale models. However, for existing parallel training frameworks, they are mostly focused on optimization of a single hardware platform, and cannot effectively utilize the advantages of heterogeneous many-core devices. In addition, the current parallel training framework requires a user to manually adjust parallel training strategies and parameters, and re-optimize for different hardware platforms, so that the current parallel training framework is difficult for non-professional users, and has certain difficulty and limitation in actual use.
Disclosure of Invention
The invention provides a method, a device and equipment for realizing open domain questions and answers of a large-scale language model, which can improve the training efficiency of the large-scale language model and the realization efficiency of intelligent questions and answers.
According to an aspect of the present invention, there is provided an open domain question-answering implementation method of a large-scale language model, including:
acquiring a plurality of segmentation operators, and acquiring a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;
obtaining a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;
acquiring a target parallel strategy from the candidate parallel strategies through a preset strategy searching algorithm;
and carrying out distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model, and adopting the target language model to realize open domain question-answering.
According to another aspect of the present invention, there is provided an open domain question-answering implementation apparatus of a large-scale language model, including:
the segmentation operator acquisition module is used for acquiring a plurality of segmentation operators, a plurality of segmentation strategies corresponding to the segmentation operators and communication operators corresponding to the segmentation strategies;
the candidate parallel strategy acquisition module is used for acquiring a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;
the target parallel strategy acquisition module is used for acquiring target parallel strategies from the candidate parallel strategies through a preset strategy search algorithm;
and the model training module is used for carrying out distributed parallel training of the large-scale language model according to the target parallel strategy so as to obtain a trained target language model, and adopting the target language model to realize open domain question-answering.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the open domain question-answering implementation method of the large-scale language model according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the open domain question-answering implementation method of the large-scale language model according to any one of the embodiments of the present invention when executed.
According to the technical scheme, a plurality of segmentation operators, a plurality of segmentation strategies corresponding to the segmentation operators and communication operators corresponding to the segmentation strategies are obtained; then, according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy, acquiring a plurality of candidate parallel strategies, and acquiring a target parallel strategy from the plurality of candidate parallel strategies through a preset strategy searching algorithm; finally, distributed parallel training of the large-scale language model is carried out according to the target parallel strategy to obtain a trained target language model, and the target language model is adopted to realize open domain question-answering, and the training efficiency of the large-scale language model can be improved and the realization efficiency of intelligent question-answering can be improved by automatically generating candidate parallel strategies and selecting optimal parallel strategies.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1A is a flow chart of an open domain question-answering implementation method for a large-scale language model according to a first embodiment of the present invention;
fig. 1B is a schematic structural diagram of a heterogeneous many-core device according to a first embodiment of the present invention;
FIG. 1C is a schematic diagram of a policy search process according to a first embodiment of the present invention;
FIG. 1D is a schematic illustration of operator insertion according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of an open domain question-answering implementation device of a large-scale language model according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device implementing an open domain question-answering implementation method of a large-scale language model according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," "target," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Embodiment one:
fig. 1A is a flowchart of an open domain question-answering implementation method of a large-scale language model, where the method may be implemented by an open domain question-answering implementation device of the large-scale language model, and the open domain question-answering implementation device of the large-scale language model may be implemented in the form of hardware and/or software, and the open domain question-answering implementation device of the large-scale language model may be configured in an electronic device, typically, the electronic device may be a computer device or a server. As shown in fig. 1A, the method includes:
s110, acquiring a plurality of segmentation operators, and acquiring a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy.
The operator may be a component unit of the deep learning model, and is used for implementing a specific data processing function, for example, may be a convolution layer operator, a pooling layer operator, and the like. The slicing operator may be an operator with feature or data slicing processing capability, such as a convolution layer operator. The communication operator can be an operator for realizing the data communication function.
Alternatively, the slicing operator may include a row slicing operator, a column slicing operator, and/or a replication operator; the different segmentation operators may correspond to different segmentation strategies, for example, the line segmentation operators may correspond to a line segmentation strategy, that is, the input features are segmented according to the line; the column slicing operator may correspond to a column slicing strategy, i.e. slicing the input features by columns, and the replication operator may correspond to a replication strategy, i.e. replication processing of the input features.
Wherein the communication operator may include a reduction operator and/or a collection operator. For example, the reduction operator may be an All reduction operator, which is used to collect data processing results from each device using preset rules (e.g., average, weighted summation, etc.). In addition, the collection operator can be an All other operator, namely, the data processing results of different devices are shared to All the devices.
In this embodiment, for each constituent unit of the deep learning model, a corresponding segmentation operator type and a corresponding possible segmentation strategy may be automatically allocated to the constituent unit; second, a feasible communication operator can also be inserted for each slicing strategy. Typically, the segmentation operator type, the segmentation strategy and the communication operator can be preset.
S120, obtaining a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy.
In this embodiment, a mapping relationship among a segmentation operator, a segmentation policy, and a communication operator may be used as one candidate parallel policy, and thus, a plurality of candidate parallel policies may be obtained. Then, the optimal parallel strategy can be selected from the candidate parallel strategies by considering different overheads.
The parallel strategy can be a strategy for parallel training of a deep learning model. In this embodiment, multiple heterogeneous many-core devices may be utilized to perform distributed parallel training on the deep learning model based on the finally selected parallel strategy. A heterogeneous many-core device may be a processor or system in which multiple processor cores of different structures are integrated within a single chip.
S130, acquiring a target parallel strategy from the candidate parallel strategies through a preset strategy searching algorithm.
The preset strategy search algorithm may include a greedy algorithm and/or a genetic algorithm. The greedy algorithm can construct a final parallel strategy by selecting the current optimal segmentation strategy each time, and has the characteristics of simplicity and effectiveness. The genetic algorithm can simulate the genetic and evolutionary processes in the nature, and the optimal solution is searched through operations such as selection, crossover, mutation and the like.
In this embodiment, a preset policy search algorithm may be used to search all candidate parallel policies to obtain an optimal parallel policy, so as to serve as a target parallel policy. Specifically, communication overhead, calculation overhead and memory overhead can be selected based on the characteristics of heterogeneous many-core devices to construct a loss function; furthermore, a target parallel strategy with the minimum loss value can be selected from all candidate parallel strategies according to the loss function through a preset strategy searching algorithm.
S140, performing distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model, and adopting the target language model to realize open domain question-answering.
Specifically, based on the target parallel strategy, distributed parallel training can be performed on the initial large-scale language model by adopting sample data collected in advance through a plurality of heterogeneous many-core devices participating in parallel training until the trained target language model is obtained. Wherein, the large-scale language model can be constructed based on a deep learning algorithm. In the present embodiment, the structure of the large-scale speech model may not be particularly limited.
When distributed parallel training is performed, each heterogeneous many-core device can process data responsible for the heterogeneous many-core device and communicate with other devices so as to combine calculation results after calculation is completed. In this process, the target parallel policy needs to be used to minimize communication and computation overhead and ensure that memory usage does not exceed the device's own memory limitations.
Further, after the target language model is acquired, the intelligent question and answer of the open domain can be realized based on the target language model. For example, the question text input by the user may be input into the target language model, and the answer text output by the target language model may be presented by the user. The question text can be any document type or form.
Optionally, the number of heterogeneous many-core devices participating in parallel training may be determined according to the accuracy requirement or the speed requirement of the intelligent question-answering implementation.
It can be appreciated that the target parallel strategy can also be applied to parallel training of large-scale models in other fields; for example, the target detection model, the image classification model or the voice recognition model and the like can greatly improve the efficiency and accuracy of target detection, image classification, voice recognition, intelligent question-answering and the like.
According to the technical scheme, a plurality of segmentation operators, a plurality of segmentation strategies corresponding to the segmentation operators and communication operators corresponding to the segmentation strategies are obtained; then, according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy, acquiring a plurality of candidate parallel strategies, and acquiring a target parallel strategy from the plurality of candidate parallel strategies through a preset strategy searching algorithm; finally, distributed parallel training of the large-scale language model is carried out according to the target parallel strategy to obtain a trained target language model, and the target language model is adopted to realize open domain question-answering, and the training efficiency of the large-scale language model can be improved and the realization efficiency of intelligent question-answering can be improved by automatically generating candidate parallel strategies and selecting optimal parallel strategies.
In another optional implementation manner of this embodiment, obtaining, by a preset policy search algorithm, a target parallel policy from the plurality of candidate parallel policies may include:
acquiring a loss function according to the calculation cost, the communication cost and the memory cost;
and acquiring a target parallel strategy from the plurality of candidate parallel strategies according to the loss function through a preset strategy searching algorithm.
It should be noted that, the structure of the heterogeneous many-core device may be shown in fig. 1B, and compared with a common accelerator card, the heterogeneous many-core device has a cross-segment video memory and a core group unique video memory. The cross section display is divided from many-core video memories in the acceleration card, and is constructed into continuous video memory addresses; the core group is divided from the many-core video memories which are displayed in the acceleration card, and the specific many-cores can be accessed at high speed. Therefore, heterogeneous many-core devices have certain advantages in terms of communications, so that communication factors need to be taken into account in policy searches. Furthermore, unlike ordinary accelerator cards, single cards of heterogeneous many-core devices have 4-core groups, and thus the computing factors of the core groups need to be taken into account in policy searching.
In this embodiment, the policy search flow may be as shown in fig. 1C; the penalty function used for parallel training may be set based on communication overhead, computation overhead, and memory overhead, among others. For example, since communication overhead is usually a bottleneck in distributed parallel, and since memory of each device is limited, the device memory occupation condition needs to be considered to avoid the problem of memory overflow, the ratio of total communication overhead to total calculation overhead, and whether the memory occupation ratio is smaller than a preset ratio threshold can be used as a loss function. Then, the optimal target parallel strategy can be selected from all candidate parallel strategies according to the loss function through a preset strategy searching algorithm.
The method has the advantages that the advantages of heterogeneous many-core equipment can be fully utilized, the optimal parallel strategy can be automatically selected according to hardware configuration and model parameters, the efficiency of parallel training of a large-scale model can be improved, the training time can be shortened, and the iteration speed of the model can be improved.
In another optional implementation manner of this embodiment, obtaining, by a preset policy search algorithm, a target parallel policy from the plurality of candidate parallel policies according to the loss function may include:
acquiring a loss value corresponding to each candidate parallel strategy according to the loss function through a preset strategy searching algorithm;
and if the minimum loss value corresponding to the current candidate parallel strategy is detected, taking the current candidate parallel strategy as a target parallel strategy.
In a specific example, a ratio of communication overhead to calculation overhead corresponding to each candidate parallel policy can be calculated through a preset policy search algorithm, so as to be used as a loss value, and meanwhile, a memory occupation ratio corresponding to each candidate parallel policy is obtained. Then, firstly judging whether the memory occupation ratio corresponding to each candidate parallel strategy is smaller than a preset occupation ratio threshold value or not so as to acquire a certain number of primary screening candidate parallel strategies; and comparing the loss values corresponding to all the primary screening candidate parallel strategies to obtain the primary screening candidate parallel strategy corresponding to the minimum loss value, and taking the primary screening candidate parallel strategy as a final target parallel strategy.
In another optional implementation manner of this embodiment, performing distributed parallel training of the large-scale language model according to the target parallel policy to obtain a trained target language model may include:
acquiring a target segmentation operator, a target segmentation strategy and a target communication operator according to the target parallel strategy;
performing segmentation processing on the input features of the target segmentation operator according to the target segmentation strategy to obtain segmentation features, and sending the segmentation features to training participation equipment;
and carrying out communication summarization on the calculation results fed back by the training participation equipment through the target communication operator so as to obtain the total calculation results corresponding to the input features.
In one specific example, an operator insertion schematic may be as shown in FIG. 1D. For a convolution layer operator, a column segmentation operator corresponding to the column segmentation logic can be inserted, so that input features can be segmented according to columns to obtain [0,1,2 and 3;4,5,6,7× [10,11,12,13] and [0,1,2,3;4,5,6,7× [14,15,16,17] two split features; then, the segmentation features can be respectively sent to one training participation device so that each training participation device can respectively calculate and obtain a convolution result; then, summarizing and combining the calculation results of all training participation equipment through an all other operator to obtain a total calculation result [14,98;258,346].
Secondly, when a row segmentation algorithm corresponding to the row segmentation logic is inserted, the input features can be segmented according to the row to obtain [0,1;4,5 x [10,14;11,15] and [2,3;6,7× [12,16;13,17] two segmentation features; then, the segmentation features can be respectively sent to the corresponding training participation devices, and the calculation results of the training participation devices can be subjected to reduction processing through an all reduction operator, for example, element values at the same position of a matrix are added to obtain a final total calculation result [74,98;258,346].
When a duplication algorithm corresponding to the duplication logic is inserted, the input features may be duplicated and then distributed to each training participation device, so that each training participation device performs the same convolution operation. And finally, carrying out corresponding summarization processing on the calculation results of each training participation device through the current target communication operator so as to obtain a final total calculation result.
Therefore, for each component unit of the large-scale language model, parameter adjustment can be carried out through the process, so that the language model which is finally trained is obtained.
The device has the advantages that parallelization processing can be automatically performed according to the characteristics of different models and the conditions of hardware resources, the workload of manual adjustment is reduced, the operation difficulty is reduced, and the training efficiency and the training precision are improved; and secondly, the method can support large-scale model training and can meet the training requirements of mass data and complex models.
Embodiment two:
fig. 2 is a schematic structural diagram of an open domain question-answering implementation device for a large-scale language model according to a second embodiment of the present invention. As shown in fig. 2, the apparatus may include: a segmentation operator acquisition module 210, a candidate parallel strategy acquisition module 220, a target parallel strategy acquisition module 230, and a model training module 240; wherein,,
a segmentation operator obtaining module 210, configured to obtain a plurality of segmentation operators, and obtain a plurality of segmentation policies corresponding to the segmentation operators, and communication operators corresponding to the segmentation policies;
the candidate parallel policy obtaining module 220 is configured to obtain a plurality of candidate parallel policies according to each of the segmentation operators, a plurality of segmentation policies corresponding to each of the segmentation operators, and a communication operator corresponding to each of the segmentation policies;
the target parallel policy obtaining module 230 is configured to obtain a target parallel policy from the plurality of candidate parallel policies through a preset policy search algorithm;
the model training module 240 is configured to perform distributed parallel training of the large-scale language model according to the target parallel policy, so as to obtain a trained target language model, and implement open-domain question-answering by using the target language model.
According to the technical scheme, a plurality of segmentation operators, a plurality of segmentation strategies corresponding to the segmentation operators and communication operators corresponding to the segmentation strategies are obtained; then, according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy, acquiring a plurality of candidate parallel strategies, and acquiring a target parallel strategy from the plurality of candidate parallel strategies through a preset strategy searching algorithm; finally, distributed parallel training of the large-scale language model is carried out according to the target parallel strategy to obtain a trained target language model, and the target language model is adopted to realize open domain question-answering, and the training efficiency of the large-scale language model can be improved and the realization efficiency of intelligent question-answering can be improved by automatically generating candidate parallel strategies and selecting optimal parallel strategies.
Optionally, the target parallel policy obtaining module 230 includes:
the loss function acquisition unit is used for acquiring a loss function according to the calculation cost, the communication cost and the memory cost;
the target parallel strategy acquisition unit is used for acquiring the target parallel strategy from the plurality of candidate parallel strategies according to the loss function through a preset strategy search algorithm.
Optionally, the target parallel policy obtaining unit is specifically configured to obtain, according to the loss function, a loss value corresponding to each candidate parallel policy through a preset policy searching algorithm;
and if the minimum loss value corresponding to the current candidate parallel strategy is detected, taking the current candidate parallel strategy as a target parallel strategy.
Optionally, the model training module 240 includes:
the target segmentation operator acquisition unit is used for acquiring a target segmentation operator, a target segmentation strategy and a target communication operator according to the target parallel strategy;
the segmentation feature acquisition unit is used for carrying out segmentation processing on the input features of the target segmentation operator according to the target segmentation strategy so as to acquire segmentation features and sending the segmentation features to each training participation device;
and the total calculation result acquisition unit is used for carrying out communication summarization on the calculation results fed back by the training participation equipment through the target communication operator so as to acquire the total calculation results corresponding to the input features.
Optionally, the preset policy search algorithm includes a greedy algorithm and/or a genetic algorithm.
Optionally, the segmentation operator includes a row segmentation operator, a column segmentation operator and/or a replication operator, and the communication operator includes a reduction operator and/or a collection operator.
The open domain question-answering implementation device of the large-scale language model provided by the embodiment of the invention can execute the open domain question-answering implementation method of the large-scale language model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Embodiment III:
fig. 3 shows a schematic diagram of an electronic device 30 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 30 includes at least one processor 31, and a memory, such as a Read Only Memory (ROM) 32, a Random Access Memory (RAM) 33, etc., communicatively connected to the at least one processor 31, wherein the memory stores a computer program executable by the at least one processor, and the processor 31 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 32 or the computer program loaded from the storage unit 38 into the Random Access Memory (RAM) 33. In the RAM 33, various programs and data required for the operation of the electronic device 30 may also be stored. The processor 31, the ROM 32 and the RAM 33 are connected to each other via a bus 34. An input/output (I/O) interface 35 is also connected to bus 34.
Various components in electronic device 30 are connected to I/O interface 35, including: an input unit 36 such as a keyboard, a mouse, etc.; an output unit 37 such as various types of displays, speakers, and the like; a storage unit 38 such as a magnetic disk, an optical disk, or the like; and a communication unit 39 such as a network card, modem, wireless communication transceiver, etc. The communication unit 39 allows the electronic device 30 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 31 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 31 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 31 performs the various methods and processes described above, such as the open domain question-answering implementation of a large-scale language model.
In some embodiments, the open domain question-answering implementation of the large-scale language model may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 38. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 30 via the ROM 32 and/or the communication unit 39. When the computer program is loaded into RAM 33 and executed by processor 31, one or more steps of the open domain question-answering implementation method of the large-scale language model described above can be performed. Alternatively, in other embodiments, processor 31 may be configured to perform the open domain question-answering implementation of the large-scale language model in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (10)
1. An open domain question-answering implementation method of a large-scale language model is characterized by comprising the following steps:
acquiring a plurality of segmentation operators, and acquiring a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;
obtaining a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;
acquiring a target parallel strategy from the candidate parallel strategies through a preset strategy searching algorithm;
and carrying out distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model, and adopting the target language model to realize open domain question-answering.
2. The method of claim 1, wherein obtaining, by a preset policy search algorithm, a target parallel policy from the plurality of candidate parallel policies, comprises:
acquiring a loss function according to the calculation cost, the communication cost and the memory cost;
and acquiring a target parallel strategy from the plurality of candidate parallel strategies according to the loss function through a preset strategy searching algorithm.
3. The method according to claim 2, wherein obtaining, by a preset policy search algorithm, a target parallel policy from the plurality of candidate parallel policies according to the loss function, comprises:
acquiring a loss value corresponding to each candidate parallel strategy according to the loss function through a preset strategy searching algorithm;
and if the minimum loss value corresponding to the current candidate parallel strategy is detected, taking the current candidate parallel strategy as a target parallel strategy.
4. The method of claim 1, wherein performing distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model comprises:
acquiring a target segmentation operator, a target segmentation strategy and a target communication operator according to the target parallel strategy;
performing segmentation processing on the input features of the target segmentation operator according to the target segmentation strategy to obtain segmentation features, and sending the segmentation features to training participation equipment;
and carrying out communication summarization on the calculation results fed back by the training participation equipment through the target communication operator so as to obtain the total calculation results corresponding to the input features.
5. A method according to any of claims 1-3, wherein the preset policy search algorithm comprises a greedy algorithm and/or a genetic algorithm.
6. The method according to claim 1 or 4, wherein the segmentation operator comprises a row segmentation operator, a column segmentation operator and/or a replication operator, and wherein the communication operator comprises a reduction operator and/or a collection operator.
7. An open domain question-answering implementation device for a large-scale language model, comprising:
the segmentation operator acquisition module is used for acquiring a plurality of segmentation operators, a plurality of segmentation strategies corresponding to the segmentation operators and communication operators corresponding to the segmentation strategies;
the candidate parallel strategy acquisition module is used for acquiring a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;
the target parallel strategy acquisition module is used for acquiring target parallel strategies from the candidate parallel strategies through a preset strategy search algorithm;
and the model training module is used for carrying out distributed parallel training of the large-scale language model according to the target parallel strategy so as to obtain a trained target language model, and adopting the target language model to realize open domain question-answering.
8. The apparatus of claim 7, wherein the target parallel policy acquisition module comprises:
the loss function acquisition unit is used for acquiring a loss function according to the calculation cost, the communication cost and the memory cost;
the target parallel strategy acquisition unit is used for acquiring the target parallel strategy from the plurality of candidate parallel strategies according to the loss function through a preset strategy search algorithm.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the open domain question-answering implementation method of the large scale language model of any one of claims 1-6.
10. A computer readable storage medium storing computer instructions for causing a processor to implement the open domain question-answering implementation method of the large-scale language model of any one of claims 1-6 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310688299.3A CN116662509A (en) | 2023-06-09 | 2023-06-09 | Open domain question-answering implementation method, device and equipment of large-scale language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310688299.3A CN116662509A (en) | 2023-06-09 | 2023-06-09 | Open domain question-answering implementation method, device and equipment of large-scale language model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116662509A true CN116662509A (en) | 2023-08-29 |
Family
ID=87720450
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310688299.3A Pending CN116662509A (en) | 2023-06-09 | 2023-06-09 | Open domain question-answering implementation method, device and equipment of large-scale language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116662509A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117785492A (en) * | 2024-02-28 | 2024-03-29 | 上海燧原智能科技有限公司 | Operator segmentation method determining method, device, equipment and medium |
-
2023
- 2023-06-09 CN CN202310688299.3A patent/CN116662509A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117785492A (en) * | 2024-02-28 | 2024-03-29 | 上海燧原智能科技有限公司 | Operator segmentation method determining method, device, equipment and medium |
CN117785492B (en) * | 2024-02-28 | 2024-05-17 | 上海燧原智能科技有限公司 | Operator segmentation method determining method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116662509A (en) | Open domain question-answering implementation method, device and equipment of large-scale language model | |
CN114462577A (en) | Federated learning system, method, computer equipment and storage medium | |
CN114841315A (en) | Method and system for implementing hybrid expert model, electronic device and storage medium | |
CN115358411A (en) | Data processing method, device, equipment and medium | |
CN114581732A (en) | Image processing and model training method, device, equipment and storage medium | |
CN111339290A (en) | Text classification method and system | |
CN112560480A (en) | Task community discovery method, device, equipment and storage medium | |
CN113641804A (en) | Pre-training model obtaining method and device, electronic equipment and storage medium | |
CN114896418A (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
CN115439916A (en) | Face recognition method, apparatus, device and medium | |
CN115203564A (en) | Information flow recommendation method and device and computer program product | |
CN111506872B (en) | Task allocation method and device based on load matrix analysis | |
CN115186738A (en) | Model training method, device and storage medium | |
CN116628167B (en) | Response determination method and device, electronic equipment and storage medium | |
US12038989B2 (en) | Methods for community search, method for training community search model, and electronic device | |
CN115510203B (en) | Method, device, equipment, storage medium and program product for determining answers to questions | |
CN118035507B (en) | Data query system and method based on data mining technology | |
CN115223177A (en) | Text recognition method, device, equipment and storage medium | |
CN116244413A (en) | New intention determining method, apparatus and storage medium | |
CN114493002A (en) | Production intention prediction method and device and electronic equipment | |
CN117933353A (en) | Reinforced learning model training method and device, electronic equipment and storage medium | |
CN118227580A (en) | Log analysis method and device, electronic equipment and storage medium | |
CN114781621A (en) | Neural network determining method and device, electronic equipment and storage medium | |
CN117519996A (en) | Data processing method, device, equipment and storage medium | |
CN116308455A (en) | Method and device for identifying hub area in trade network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |