CN116662509A

CN116662509A - Open domain question-answering implementation method, device and equipment of large-scale language model

Info

Publication number: CN116662509A
Application number: CN202310688299.3A
Authority: CN
Inventors: 吴志华; 孙瑞鑫
Original assignee: Taichu Wuxi Electronic Technology Co ltd
Current assignee: Taichu Wuxi Electronic Technology Co ltd
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-08-29

Abstract

The invention relates to the technical field of deep learning, and discloses a method, a device and equipment for realizing open domain question and answer of a large-scale language model. The method comprises the following steps: acquiring a plurality of segmentation operators, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy; acquiring a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy; acquiring a target parallel strategy from a plurality of candidate parallel strategies through a preset strategy searching algorithm; and carrying out distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model, and adopting the target language model to realize open domain question-answering. According to the technical scheme, the candidate parallel strategy is automatically generated and the optimal parallel strategy is selected, so that training efficiency of a large-scale language model can be improved, and the realization efficiency of intelligent question-answering can be improved.

Description

Open domain question-answering implementation method, device and equipment of large-scale language model

Technical Field

The invention relates to the technical field of deep learning, in particular to a method, a device and equipment for realizing open domain question and answer of a large-scale language model.

Background

Along with the gradual complicating of large-scale language models, single-machine single-card training can not meet the training requirement of large-scale models, so that multi-machine multi-card parallel training becomes a research hot spot in the current deep learning field. In multi-machine multi-card parallel training, research on how to effectively utilize heterogeneous many-core devices and automated parallel strategies has received much attention.

Currently, existing parallel training methods typically utilize existing parallel training frameworks, such as TensorFlow, pyTorch, to achieve parallel training of large-scale models. However, for existing parallel training frameworks, they are mostly focused on optimization of a single hardware platform, and cannot effectively utilize the advantages of heterogeneous many-core devices. In addition, the current parallel training framework requires a user to manually adjust parallel training strategies and parameters, and re-optimize for different hardware platforms, so that the current parallel training framework is difficult for non-professional users, and has certain difficulty and limitation in actual use.

Disclosure of Invention

The invention provides a method, a device and equipment for realizing open domain questions and answers of a large-scale language model, which can improve the training efficiency of the large-scale language model and the realization efficiency of intelligent questions and answers.

According to an aspect of the present invention, there is provided an open domain question-answering implementation method of a large-scale language model, including:

acquiring a plurality of segmentation operators, and acquiring a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;

obtaining a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;

acquiring a target parallel strategy from the candidate parallel strategies through a preset strategy searching algorithm;

and carrying out distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model, and adopting the target language model to realize open domain question-answering.

According to another aspect of the present invention, there is provided an open domain question-answering implementation apparatus of a large-scale language model, including:

the segmentation operator acquisition module is used for acquiring a plurality of segmentation operators, a plurality of segmentation strategies corresponding to the segmentation operators and communication operators corresponding to the segmentation strategies;

the candidate parallel strategy acquisition module is used for acquiring a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy;

the target parallel strategy acquisition module is used for acquiring target parallel strategies from the candidate parallel strategies through a preset strategy search algorithm;

and the model training module is used for carrying out distributed parallel training of the large-scale language model according to the target parallel strategy so as to obtain a trained target language model, and adopting the target language model to realize open domain question-answering.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the open domain question-answering implementation method of the large-scale language model according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the open domain question-answering implementation method of the large-scale language model according to any one of the embodiments of the present invention when executed.

According to the technical scheme, a plurality of segmentation operators, a plurality of segmentation strategies corresponding to the segmentation operators and communication operators corresponding to the segmentation strategies are obtained; then, according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy, acquiring a plurality of candidate parallel strategies, and acquiring a target parallel strategy from the plurality of candidate parallel strategies through a preset strategy searching algorithm; finally, distributed parallel training of the large-scale language model is carried out according to the target parallel strategy to obtain a trained target language model, and the target language model is adopted to realize open domain question-answering, and the training efficiency of the large-scale language model can be improved and the realization efficiency of intelligent question-answering can be improved by automatically generating candidate parallel strategies and selecting optimal parallel strategies.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1A is a flow chart of an open domain question-answering implementation method for a large-scale language model according to a first embodiment of the present invention;

fig. 1B is a schematic structural diagram of a heterogeneous many-core device according to a first embodiment of the present invention;

FIG. 1C is a schematic diagram of a policy search process according to a first embodiment of the present invention;

FIG. 1D is a schematic illustration of operator insertion according to a first embodiment of the present invention;

fig. 2 is a schematic structural diagram of an open domain question-answering implementation device of a large-scale language model according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device implementing an open domain question-answering implementation method of a large-scale language model according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," "target," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Embodiment one:

fig. 1A is a flowchart of an open domain question-answering implementation method of a large-scale language model, where the method may be implemented by an open domain question-answering implementation device of the large-scale language model, and the open domain question-answering implementation device of the large-scale language model may be implemented in the form of hardware and/or software, and the open domain question-answering implementation device of the large-scale language model may be configured in an electronic device, typically, the electronic device may be a computer device or a server. As shown in fig. 1A, the method includes:

s110, acquiring a plurality of segmentation operators, and acquiring a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy.

The operator may be a component unit of the deep learning model, and is used for implementing a specific data processing function, for example, may be a convolution layer operator, a pooling layer operator, and the like. The slicing operator may be an operator with feature or data slicing processing capability, such as a convolution layer operator. The communication operator can be an operator for realizing the data communication function.

Alternatively, the slicing operator may include a row slicing operator, a column slicing operator, and/or a replication operator; the different segmentation operators may correspond to different segmentation strategies, for example, the line segmentation operators may correspond to a line segmentation strategy, that is, the input features are segmented according to the line; the column slicing operator may correspond to a column slicing strategy, i.e. slicing the input features by columns, and the replication operator may correspond to a replication strategy, i.e. replication processing of the input features.

Wherein the communication operator may include a reduction operator and/or a collection operator. For example, the reduction operator may be an All reduction operator, which is used to collect data processing results from each device using preset rules (e.g., average, weighted summation, etc.). In addition, the collection operator can be an All other operator, namely, the data processing results of different devices are shared to All the devices.

In this embodiment, for each constituent unit of the deep learning model, a corresponding segmentation operator type and a corresponding possible segmentation strategy may be automatically allocated to the constituent unit; second, a feasible communication operator can also be inserted for each slicing strategy. Typically, the segmentation operator type, the segmentation strategy and the communication operator can be preset.

S120, obtaining a plurality of candidate parallel strategies according to each segmentation operator, a plurality of segmentation strategies corresponding to each segmentation operator and communication operators corresponding to each segmentation strategy.

In this embodiment, a mapping relationship among a segmentation operator, a segmentation policy, and a communication operator may be used as one candidate parallel policy, and thus, a plurality of candidate parallel policies may be obtained. Then, the optimal parallel strategy can be selected from the candidate parallel strategies by considering different overheads.

The parallel strategy can be a strategy for parallel training of a deep learning model. In this embodiment, multiple heterogeneous many-core devices may be utilized to perform distributed parallel training on the deep learning model based on the finally selected parallel strategy. A heterogeneous many-core device may be a processor or system in which multiple processor cores of different structures are integrated within a single chip.

S130, acquiring a target parallel strategy from the candidate parallel strategies through a preset strategy searching algorithm.

The preset strategy search algorithm may include a greedy algorithm and/or a genetic algorithm. The greedy algorithm can construct a final parallel strategy by selecting the current optimal segmentation strategy each time, and has the characteristics of simplicity and effectiveness. The genetic algorithm can simulate the genetic and evolutionary processes in the nature, and the optimal solution is searched through operations such as selection, crossover, mutation and the like.

In this embodiment, a preset policy search algorithm may be used to search all candidate parallel policies to obtain an optimal parallel policy, so as to serve as a target parallel policy. Specifically, communication overhead, calculation overhead and memory overhead can be selected based on the characteristics of heterogeneous many-core devices to construct a loss function; furthermore, a target parallel strategy with the minimum loss value can be selected from all candidate parallel strategies according to the loss function through a preset strategy searching algorithm.

S140, performing distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model, and adopting the target language model to realize open domain question-answering.

Specifically, based on the target parallel strategy, distributed parallel training can be performed on the initial large-scale language model by adopting sample data collected in advance through a plurality of heterogeneous many-core devices participating in parallel training until the trained target language model is obtained. Wherein, the large-scale language model can be constructed based on a deep learning algorithm. In the present embodiment, the structure of the large-scale speech model may not be particularly limited.

When distributed parallel training is performed, each heterogeneous many-core device can process data responsible for the heterogeneous many-core device and communicate with other devices so as to combine calculation results after calculation is completed. In this process, the target parallel policy needs to be used to minimize communication and computation overhead and ensure that memory usage does not exceed the device's own memory limitations.

Further, after the target language model is acquired, the intelligent question and answer of the open domain can be realized based on the target language model. For example, the question text input by the user may be input into the target language model, and the answer text output by the target language model may be presented by the user. The question text can be any document type or form.

Optionally, the number of heterogeneous many-core devices participating in parallel training may be determined according to the accuracy requirement or the speed requirement of the intelligent question-answering implementation.

It can be appreciated that the target parallel strategy can also be applied to parallel training of large-scale models in other fields; for example, the target detection model, the image classification model or the voice recognition model and the like can greatly improve the efficiency and accuracy of target detection, image classification, voice recognition, intelligent question-answering and the like.

In another optional implementation manner of this embodiment, obtaining, by a preset policy search algorithm, a target parallel policy from the plurality of candidate parallel policies may include:

acquiring a loss function according to the calculation cost, the communication cost and the memory cost;

and acquiring a target parallel strategy from the plurality of candidate parallel strategies according to the loss function through a preset strategy searching algorithm.

It should be noted that, the structure of the heterogeneous many-core device may be shown in fig. 1B, and compared with a common accelerator card, the heterogeneous many-core device has a cross-segment video memory and a core group unique video memory. The cross section display is divided from many-core video memories in the acceleration card, and is constructed into continuous video memory addresses; the core group is divided from the many-core video memories which are displayed in the acceleration card, and the specific many-cores can be accessed at high speed. Therefore, heterogeneous many-core devices have certain advantages in terms of communications, so that communication factors need to be taken into account in policy searches. Furthermore, unlike ordinary accelerator cards, single cards of heterogeneous many-core devices have 4-core groups, and thus the computing factors of the core groups need to be taken into account in policy searching.

In this embodiment, the policy search flow may be as shown in fig. 1C; the penalty function used for parallel training may be set based on communication overhead, computation overhead, and memory overhead, among others. For example, since communication overhead is usually a bottleneck in distributed parallel, and since memory of each device is limited, the device memory occupation condition needs to be considered to avoid the problem of memory overflow, the ratio of total communication overhead to total calculation overhead, and whether the memory occupation ratio is smaller than a preset ratio threshold can be used as a loss function. Then, the optimal target parallel strategy can be selected from all candidate parallel strategies according to the loss function through a preset strategy searching algorithm.

The method has the advantages that the advantages of heterogeneous many-core equipment can be fully utilized, the optimal parallel strategy can be automatically selected according to hardware configuration and model parameters, the efficiency of parallel training of a large-scale model can be improved, the training time can be shortened, and the iteration speed of the model can be improved.

In another optional implementation manner of this embodiment, obtaining, by a preset policy search algorithm, a target parallel policy from the plurality of candidate parallel policies according to the loss function may include:

acquiring a loss value corresponding to each candidate parallel strategy according to the loss function through a preset strategy searching algorithm;

and if the minimum loss value corresponding to the current candidate parallel strategy is detected, taking the current candidate parallel strategy as a target parallel strategy.

In a specific example, a ratio of communication overhead to calculation overhead corresponding to each candidate parallel policy can be calculated through a preset policy search algorithm, so as to be used as a loss value, and meanwhile, a memory occupation ratio corresponding to each candidate parallel policy is obtained. Then, firstly judging whether the memory occupation ratio corresponding to each candidate parallel strategy is smaller than a preset occupation ratio threshold value or not so as to acquire a certain number of primary screening candidate parallel strategies; and comparing the loss values corresponding to all the primary screening candidate parallel strategies to obtain the primary screening candidate parallel strategy corresponding to the minimum loss value, and taking the primary screening candidate parallel strategy as a final target parallel strategy.

In another optional implementation manner of this embodiment, performing distributed parallel training of the large-scale language model according to the target parallel policy to obtain a trained target language model may include:

acquiring a target segmentation operator, a target segmentation strategy and a target communication operator according to the target parallel strategy;

performing segmentation processing on the input features of the target segmentation operator according to the target segmentation strategy to obtain segmentation features, and sending the segmentation features to training participation equipment;

and carrying out communication summarization on the calculation results fed back by the training participation equipment through the target communication operator so as to obtain the total calculation results corresponding to the input features.

In one specific example, an operator insertion schematic may be as shown in FIG. 1D. For a convolution layer operator, a column segmentation operator corresponding to the column segmentation logic can be inserted, so that input features can be segmented according to columns to obtain [0,1,2 and 3;4,5,6,7× [10,11,12,13] and [0,1,2,3;4,5,6,7× [14,15,16,17] two split features; then, the segmentation features can be respectively sent to one training participation device so that each training participation device can respectively calculate and obtain a convolution result; then, summarizing and combining the calculation results of all training participation equipment through an all other operator to obtain a total calculation result [14,98;258,346].

Secondly, when a row segmentation algorithm corresponding to the row segmentation logic is inserted, the input features can be segmented according to the row to obtain [0,1;4,5 x [10,14;11,15] and [2,3;6,7× [12,16;13,17] two segmentation features; then, the segmentation features can be respectively sent to the corresponding training participation devices, and the calculation results of the training participation devices can be subjected to reduction processing through an all reduction operator, for example, element values at the same position of a matrix are added to obtain a final total calculation result [74,98;258,346].

When a duplication algorithm corresponding to the duplication logic is inserted, the input features may be duplicated and then distributed to each training participation device, so that each training participation device performs the same convolution operation. And finally, carrying out corresponding summarization processing on the calculation results of each training participation device through the current target communication operator so as to obtain a final total calculation result.

Therefore, for each component unit of the large-scale language model, parameter adjustment can be carried out through the process, so that the language model which is finally trained is obtained.

The device has the advantages that parallelization processing can be automatically performed according to the characteristics of different models and the conditions of hardware resources, the workload of manual adjustment is reduced, the operation difficulty is reduced, and the training efficiency and the training precision are improved; and secondly, the method can support large-scale model training and can meet the training requirements of mass data and complex models.

Embodiment two:

fig. 2 is a schematic structural diagram of an open domain question-answering implementation device for a large-scale language model according to a second embodiment of the present invention. As shown in fig. 2, the apparatus may include: a segmentation operator acquisition module 210, a candidate parallel strategy acquisition module 220, a target parallel strategy acquisition module 230, and a model training module 240; wherein,,

a segmentation operator obtaining module 210, configured to obtain a plurality of segmentation operators, and obtain a plurality of segmentation policies corresponding to the segmentation operators, and communication operators corresponding to the segmentation policies;

the candidate parallel policy obtaining module 220 is configured to obtain a plurality of candidate parallel policies according to each of the segmentation operators, a plurality of segmentation policies corresponding to each of the segmentation operators, and a communication operator corresponding to each of the segmentation policies;

the target parallel policy obtaining module 230 is configured to obtain a target parallel policy from the plurality of candidate parallel policies through a preset policy search algorithm;

the model training module 240 is configured to perform distributed parallel training of the large-scale language model according to the target parallel policy, so as to obtain a trained target language model, and implement open-domain question-answering by using the target language model.

Optionally, the target parallel policy obtaining module 230 includes:

the loss function acquisition unit is used for acquiring a loss function according to the calculation cost, the communication cost and the memory cost;

the target parallel strategy acquisition unit is used for acquiring the target parallel strategy from the plurality of candidate parallel strategies according to the loss function through a preset strategy search algorithm.

Optionally, the target parallel policy obtaining unit is specifically configured to obtain, according to the loss function, a loss value corresponding to each candidate parallel policy through a preset policy searching algorithm;

Optionally, the model training module 240 includes:

the target segmentation operator acquisition unit is used for acquiring a target segmentation operator, a target segmentation strategy and a target communication operator according to the target parallel strategy;

the segmentation feature acquisition unit is used for carrying out segmentation processing on the input features of the target segmentation operator according to the target segmentation strategy so as to acquire segmentation features and sending the segmentation features to each training participation device;

and the total calculation result acquisition unit is used for carrying out communication summarization on the calculation results fed back by the training participation equipment through the target communication operator so as to acquire the total calculation results corresponding to the input features.

Optionally, the preset policy search algorithm includes a greedy algorithm and/or a genetic algorithm.

Optionally, the segmentation operator includes a row segmentation operator, a column segmentation operator and/or a replication operator, and the communication operator includes a reduction operator and/or a collection operator.

The open domain question-answering implementation device of the large-scale language model provided by the embodiment of the invention can execute the open domain question-answering implementation method of the large-scale language model provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Embodiment III:

fig. 3 shows a schematic diagram of an electronic device 30 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 3, the electronic device 30 includes at least one processor 31, and a memory, such as a Read Only Memory (ROM) 32, a Random Access Memory (RAM) 33, etc., communicatively connected to the at least one processor 31, wherein the memory stores a computer program executable by the at least one processor, and the processor 31 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 32 or the computer program loaded from the storage unit 38 into the Random Access Memory (RAM) 33. In the RAM 33, various programs and data required for the operation of the electronic device 30 may also be stored. The processor 31, the ROM 32 and the RAM 33 are connected to each other via a bus 34. An input/output (I/O) interface 35 is also connected to bus 34.

Various components in electronic device 30 are connected to I/O interface 35, including: an input unit 36 such as a keyboard, a mouse, etc.; an output unit 37 such as various types of displays, speakers, and the like; a storage unit 38 such as a magnetic disk, an optical disk, or the like; and a communication unit 39 such as a network card, modem, wireless communication transceiver, etc. The communication unit 39 allows the electronic device 30 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 31 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 31 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 31 performs the various methods and processes described above, such as the open domain question-answering implementation of a large-scale language model.

In some embodiments, the open domain question-answering implementation of the large-scale language model may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 38. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 30 via the ROM 32 and/or the communication unit 39. When the computer program is loaded into RAM 33 and executed by processor 31, one or more steps of the open domain question-answering implementation method of the large-scale language model described above can be performed. Alternatively, in other embodiments, processor 31 may be configured to perform the open domain question-answering implementation of the large-scale language model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. An open domain question-answering implementation method of a large-scale language model is characterized by comprising the following steps:

2. The method of claim 1, wherein obtaining, by a preset policy search algorithm, a target parallel policy from the plurality of candidate parallel policies, comprises:

3. The method according to claim 2, wherein obtaining, by a preset policy search algorithm, a target parallel policy from the plurality of candidate parallel policies according to the loss function, comprises:

4. The method of claim 1, wherein performing distributed parallel training of the large-scale language model according to the target parallel strategy to obtain a trained target language model comprises:

5. A method according to any of claims 1-3, wherein the preset policy search algorithm comprises a greedy algorithm and/or a genetic algorithm.

6. The method according to claim 1 or 4, wherein the segmentation operator comprises a row segmentation operator, a column segmentation operator and/or a replication operator, and wherein the communication operator comprises a reduction operator and/or a collection operator.

7. An open domain question-answering implementation device for a large-scale language model, comprising:

8. The apparatus of claim 7, wherein the target parallel policy acquisition module comprises:

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the open domain question-answering implementation method of the large scale language model of any one of claims 1-6.

10. A computer readable storage medium storing computer instructions for causing a processor to implement the open domain question-answering implementation method of the large-scale language model of any one of claims 1-6 when executed.