CN112866587B

CN112866587B - Method and system for distributing spoken language video synthesis tasks

Info

Publication number: CN112866587B
Application number: CN202110044080.0A
Authority: CN
Inventors: 李垦
Original assignee: Beijing Dingshixing Education Consulting Co ltd
Current assignee: Beijing Dingshixing Education Consulting Co ltd
Priority date: 2021-01-13
Filing date: 2021-01-13
Publication date: 2023-04-07
Anticipated expiration: 2041-01-13
Also published as: CN112866587A

Abstract

The disclosure relates to a method and a system for distributing a spoken language video synthesis task, wherein the method comprises the following steps: responding to a received spoken language video synthesis request sent by a client, and generating a target spoken language video synthesis task; adding the target spoken language video synthesis task into a target task queue based on a preset shunting configuration rule; the target task queue is a local task queue or a cloud task queue, spoken language video synthesis tasks in the local task queue are processed by a local synthesis server, spoken language video synthesis tasks in the cloud task queue are processed by a cloud synthesis server, and a calculation function of the cloud synthesis server for synthesizing spoken language videos is an elastic calculation function, so that calculation resources are dynamically used according to the number of the spoken language video synthesis tasks to be processed by the cloud synthesis server, and the problem of resource waste in the related technology is solved.

Description

Method and system for distributing spoken language video synthesis tasks

Technical Field

The disclosure relates to the technical field of electronic information, in particular to a method and a system for distributing a spoken language video synthesis task.

Background

The spoken language evaluation work is one of the main types of work in the online foreign language learning system. For example, after a student watches an original spoken English video, the student reads some sentences aloud and records the sentences, and the background replaces the corresponding audio segment in the original spoken English video with the student's recording to obtain a synthesized video, so that the student can know the reading effect by watching the synthesized video.

In the related art, since each machine has a limited processing capability, a plurality of machines are provided to cope with a spoken video composition request which is submitted in a large amount during a peak period, so as to ensure that each spoken video composition request can be executed normally. However, after the peak period, due to the great reduction of the submission amount, a plurality of local servers are idle, and the problem of resource waste exists.

Disclosure of Invention

In order to overcome the problems in the related art, the disclosure provides a method and a system for distributing a spoken language video synthesis task.

According to a first aspect of the embodiments of the present disclosure, there is provided a method for distributing a spoken language video composition task, the method including:

responding to a received spoken language video synthesis request sent by a client, and generating a target spoken language video synthesis task;

adding the target spoken language video synthesis task into a target task queue based on a preset shunting configuration rule;

the target task queue is a local task queue or a cloud task queue, spoken language video synthesis tasks in the local task queue are processed by a local synthesis server, the spoken language video synthesis tasks in the cloud task queue are processed by a cloud synthesis server, and a computing function of the cloud synthesis server for synthesizing spoken language videos is an elastic computing function so as to dynamically use computing resources according to the number of the spoken language video synthesis tasks to be processed by the cloud synthesis server.

Optionally, the adding the spoken language video composition task to a target task queue based on a preset shunting configuration rule includes:

determining a first probability of adding the target spoken language video synthesis task to the local task queue and a second probability of adding the target spoken language video synthesis task to the cloud task queue according to the distribution configuration rule;

generating a random number within a preset order of magnitude under the condition that the difference between the first probability and the second probability is smaller than a preset difference threshold, and converting the first probability into the preset order of magnitude to obtain a first value;

and if the random number is less than or equal to the first value, adding the target spoken language video synthesis task into the local task queue, and if the random number is greater than the first value, adding the target spoken language video synthesis task into the cloud task queue.

Optionally, the offloading configuration rules include at least one or more of the following rules:

a first class shunting rule set aiming at the user identification in the spoken language video synthesis request;

and the second class of shunting rules are set aiming at the load of the local synthesis server.

Optionally, the method further comprises:

responding to the operation of uploading a spoken language recording file by the client, and uploading the spoken language recording file to a cloud storage server, wherein the spoken language recording file is obtained by the client collecting the voice of a user reading an original video;

uploading the spoken language recording file to a local storage server under the condition that the fact that the spoken language recording file is uploaded to the cloud storage server fails is determined;

returning the file storage address of the spoken language recording file to the client;

the spoken language video synthesis request comprises the file storage address, and the spoken language recording file is used for synthesizing a spoken language video with the original video.

Optionally, the method further comprises:

and before the spoken language recording file stored by the cloud storage server is invalid, synchronously storing the spoken language recording file stored by the cloud storage server to the local storage server.

Optionally, the storage manner of the cloud storage server is object storage.

Optionally, in a case that the target task queue is the cloud task queue, the method further includes:

and in response to receiving a synthesis failure message sent by the cloud synthesis server, adding the target spoken language video synthesis task into the local task queue.

Optionally, the spoken language video obtained after the target spoken language video synthesis task is completed is stored in a cloud storage server, and the method further includes:

acquiring a processing result aiming at the target spoken language video synthesis task, wherein the processing result comprises a video storage address of the spoken language video in the cloud storage server;

and updating the relational database management system according to the processing result so that the client acquires the video storage address from the relational database management system and acquires the corresponding spoken language video from the cloud storage server according to the video storage address.

Optionally, the method further comprises:

acquiring a video acquisition request sent by the client, wherein the video acquisition request comprises a video storage address;

sending the video acquisition request to a Content Delivery Network (CDN) corresponding to the cloud storage server, wherein the CDN stores the spoken video when the client initially acquires the spoken video from the cloud storage server;

and receiving the spoken language video corresponding to the video acquisition request and sent by the content delivery server CDN, and sending the spoken language video to the client.

Optionally, the local message queue and the cloud message queue are both asynchronous message queues, or the local message queue is a RabbitMQ queue and the cloud message queue is a Kafka queue.

Optionally, the target spoken language video composition task includes a processing progress tracking parameter, and a value of the processing progress tracking parameter is used to characterize a processing stage of the target spoken language video composition task, where the method further includes:

and responding to a progress inquiry request of the client, and feeding back the processing progress of the target spoken language video synthesis task to the client according to the value of the processing progress tracking parameter.

According to a second aspect of the embodiments of the present disclosure, there is provided a spoken language video composition system, including a PHP server, a local composition server, a streaming controller, and a first communication interface for communicating with a cloud composition server;

the PHP server is used for responding to a received spoken language video synthesis request sent by a client, generating a target spoken language video synthesis task and sending the target spoken language video synthesis task to the shunt controller;

the shunting controller adds the target spoken language video synthesis task to a target task queue based on a preset shunting configuration rule;

Optionally, the system further comprises a local storage server and a second communication interface for communicating with the cloud storage server;

the PHP server is also used for responding to the operation of uploading a spoken language recording file by the client and uploading the spoken language recording file to the cloud storage server, wherein the spoken language recording file is obtained by the client collecting the voice of the user reading the original video;

Optionally, the spoken language video obtained after the target spoken language video synthesis task is completed is stored in the cloud storage server, the system further includes a Java server, and the Java server is further configured to:

and updating the relational database management system of the local storage server according to the processing result so that the client acquires the video storage address from the relational database management system and acquires the corresponding spoken language video from the cloud storage server according to the video storage address.

Optionally, the PHP server is further configured to:

sending the video acquisition request to a content delivery server (CDN) corresponding to the cloud storage server, wherein the content delivery server (CDN) stores the spoken language video when the client initially acquires the spoken language video from the cloud storage server;

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

by adopting the technical scheme, the preset shunting configuration rule can send the target spoken language video synthesis task which cannot be processed by the local synthesis server in time to the cloud task queue, so that the calculation function of the cloud synthesis server processes the target spoken language video synthesis task. Because the calculation function is an elastic calculation function, the cloud synthesis server can dynamically use calculation resources according to the number of to-be-processed spoken language video synthesis tasks of the cloud synthesis server so as to deal with the situation that the spoken language video synthesis tasks are suddenly increased and ensure that a large number of submitted spoken language video synthesis requests can be processed in time. Therefore, a plurality of local synthesis servers are not required to be arranged, and the problem of resource waste is solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flow diagram illustrating a method of distribution of a spoken video composition task, according to an example embodiment.

Fig. 2 is a flowchart illustrating step S102 according to an exemplary embodiment.

Fig. 3 is another flow diagram illustrating a method of distribution of a spoken video composition task, according to an example embodiment.

Fig. 4 is another flow diagram illustrating a method of distribution of a spoken video composition task, according to an example embodiment.

FIG. 5 is a block diagram illustrating a spoken language video composition system in accordance with an exemplary embodiment.

FIG. 6 is another block diagram illustrating a spoken language video composition system in accordance with an exemplary embodiment.

Fig. 7 is another block diagram illustrating a spoken video composition system in accordance with an example embodiment.

Detailed Description

The following detailed description of the embodiments of the disclosure refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

In order to enable those skilled in the art to more rapidly understand the improvement of the technical solutions provided by the embodiments of the present disclosure, the following first describes the related arts.

During the peak period of submitting the spoken language video synthesis task, because the processing capacity of the machine is limited, the great submission of the spoken language video synthesis task will cause the pressure of the local server to increase suddenly, which will cause the problems that the video cannot be synthesized for a long time, the user cannot normally submit and the like, thereby causing a great deal of customer complaints and influencing the user experience of the front end. Based on this, the related art adopts a capacity expansion mode, and sets a plurality of local servers for video synthesis, although this mode can deal with the situation that a large number of spoken video synthesis tasks are submitted during the peak period. However, after the peak period, some of the plurality of local servers will be idle, thereby causing resource waste.

In addition, when the local synthesis server executes the spoken language video synthesis task to obtain a video, the voice and the original video of the original video to be read by the user need to be obtained, and the voice and the original video are synthesized, so that the spoken language video with the original video to be read by the user can be obtained. Because the speech of the original video that is read by the user is stored in a local storage server, such as a local storage such as an NFS (Network File System), similarly, the synthesized spoken video is also stored in the NFS, but the storage mode depends too heavily on the NFS, and the read-write performance of the NFS is limited by the Network and hardware, and the like, when a plurality of machines read and write simultaneously in a large amount, the interface response time is far beyond the normal level, which causes NFS crash, and because the NFS is mounted on a plurality of local synthesis servers, if the NFS fails, the local synthesis servers cannot acquire the speech and the original video, which further causes that all spoken video synthesis tasks cannot be completed, thereby affecting the stability and timeliness of the whole System.

In view of this, the present disclosure provides a method and a system for distributing a spoken language video composition task, so as to solve the problems in the related art, and reduce the waste of resources while ensuring that each spoken language video composition task can be processed in time.

Fig. 1 is a flowchart illustrating a distribution method of a spoken language video composition task according to an exemplary embodiment, and as shown in fig. 1, the distribution method of the spoken language video composition task is applied to a server, and includes the following steps:

in step S101, a target spoken language video composition task is generated in response to receiving a spoken language video composition request sent by a client.

In step S102, a target spoken language video composition task is added to a target task queue based on a preset shunting configuration rule, where the target task queue is a local task queue or a cloud task queue, the spoken language video composition task in the local task queue is processed by a local composition server, the spoken language video composition task in the cloud task queue is processed by a cloud composition server, and a calculation function of the cloud composition server for synthesizing a spoken language video is an elastic calculation function.

Based on the elastic calculation function, the cloud synthesis server can dynamically use the calculation resources according to the number of the spoken language video synthesis tasks to be processed, so that the calculation resources are matched with the number of the spoken language video synthesis tasks, and the execution efficiency of the spoken language video synthesis tasks is guaranteed.

By adopting the technical scheme, the server side can send the spoken language video synthesis task to the cloud task queue or the local task queue based on the preset flow distribution configuration rule, so that the spoken language video synthesis task is processed by the local synthesis server or the cloud synthesis server. And because the computing function of the cloud synthesis server for synthesizing the spoken language video is an elastic computing function, and the computing function can dynamically use computing resources according to the number of spoken language video synthesis tasks to be processed by the cloud synthesis server, the server side can send the spoken language video synthesis tasks to the cloud synthesis server for processing under the condition of large load of the local synthesis server, and the cloud synthesis server can automatically increase the computing resources under the condition of increased task amount, so that a large number of submitted spoken language video synthesis requests can be timely processed, and a plurality of local synthesis servers are not required to be arranged. Moreover, the cloud synthesis server can automatically reduce occupied computing resources under the condition that the task amount is reduced, so that the problem of resource waste can be solved on the basis of ensuring the task processing efficiency.

In one embodiment, the shunting configuration rule in step S102 is at least one or more of the following rules:

the method comprises the steps of setting a first type of shunting rules aiming at user identification in a spoken language video synthesis request and setting a second type of shunting rules aiming at load of a local synthesis server.

Specifically, for the first type of distribution rule, the server may set a white list to specify that spoken language video composition tasks of a certain type of user identifier flow to the same task queue. The white list may specify that the spoken video composition tasks of students of the same class are all streamed to the local composition server for processing, in which case the user identification may be the class number of the student, etc. For example, a white list including class numbers of class a is set for the local composition server, and when receiving a spoken language video composition request, the server queries whether a user identifier carried in the spoken language video composition request represents class a, and if so, adds a target spoken language video composition task corresponding to the spoken language video composition request to a local task queue.

The foregoing is only an example, and other splitting rules may also be set for the user identifier, for example, the user identifier may also be a school number, and the first-type splitting rule may also be used to add an even school number to the local task queue and add an odd school number to the cloud task queue. Or, by combining the identity information, the authority information, the course information and the like of the students, the priorities are set for the students, the spoken language video synthesis task initiated by the student with the high priority is added into the local synthesis server, and the spoken language video synthesis task initiated by the student with the low priority is added into the cloud synthesis server.

By adopting the first-class shunting rule, the spoken language video synthesis task with the same user attribute can flow to the same task queue, the feeling and experience of the students with the same user attribute on the spoken language evaluation operation are improved to be consistent, and the differentiated service of the different classes of students can be realized through the first-class shunting rule set aiming at the user identification.

For the second type of distribution rule, the load size of the local composition server is used to reflect the current processing capacity of the local server, and specifically, the load size of the local composition server can be calculated through the maximum processing capacity of the local composition server and the number of the spoken video composition tasks to be processed currently. The second type of distribution rule may be a set load threshold, and the target spoken language video synthesis task is added to the cloud task queue when the load of the local synthesis server is greater than or equal to the load threshold, and the target spoken language video synthesis task is added to the local task queue when the load of the local synthesis server is less than the load threshold. The method fully considers the limitation of the processing capacity of the local synthesis server, can maintain the processing efficiency of the local synthesis server at an acceptable level, and avoids the performance bottleneck of the local synthesis server.

It should be noted that the distribution configuration rule may include both the first type distribution rule and the second type distribution rule, in this case, in order to avoid a conflict between each type of distribution rule and the second type distribution rule (for example, the target spoken language video synthesis task should be distributed to the local synthesis server under the first type distribution rule, and should be distributed to the cloud synthesis server under the second type distribution rule), the embodiments of the present disclosure may calculate, for a matching situation of the target spoken language video synthesis task with the first type distribution rule and with the second type distribution rule, a probability value that the target spoken language video synthesis task is distributed to the local synthesis server, and then determine, based on a size of the probability value, whether to distribute the target spoken language video synthesis task to the local synthesis server or the cloud synthesis server, for example, in a case that the probability value is greater than or equal to 50%, distribute the target spoken language video synthesis task to the local synthesis server, and in a case that the probability value is less than 50%, distribute the target spoken language video synthesis task to the cloud synthesis server. For another example, when the probability value is greater than or equal to 65%, the target spoken language video synthesis task is distributed to the local synthesis server, and when the probability value is less than 65%, the target spoken language video synthesis task is distributed to the cloud synthesis server. For another example, when the probability value is greater than or equal to 45%, the target spoken language video synthesis task is distributed to the local synthesis server, and when the probability value is less than 45%, the target spoken language video synthesis task is distributed to the cloud synthesis server. It should be noted that 45%, 50%, and 65% in the above example may be set according to the situation of the maximum processing capacity of the local composition server, which is not limited in this embodiment. It will be appreciated that when the processing power of the local composition server is higher, a correspondingly larger value is selected.

In another implementation, fig. 2 is a flowchart illustrating step S102 according to an exemplary embodiment, as shown in fig. 2, including the steps of:

in step S201, a first probability of adding the target spoken language video synthesis task to the local task queue and a second probability of adding the target spoken language video synthesis task to the cloud task queue are determined according to the split configuration rule.

For example, the distribution configuration rule may include a plurality of distribution rules pointing to the local task queue and a plurality of distribution rules pointing to the cloud task queue, where a probability value corresponds to each distribution rule, and a sum of the probability values corresponding to all the distribution rules is 100%. Therefore, the probability values corresponding to the distribution rules pointing to the local task queue and met by the target spoken language video synthesis task are summed, so that the first probability can be obtained, and the probability values corresponding to the distribution rules pointing to the cloud task queue and met by the target spoken language video synthesis task are summed, so that the second probability can be obtained.

By way of further example, first, a preset processing capacity V1 of each consuming thread of the local composition server, a number N1 of spoken video composition tasks currently processed by the local composition server, and a number N2 of spoken video composition tasks backlogged by a current local task queue are determined. The preset processing capacity represents the number of spoken language video synthesis tasks capable of being processed per second, and can be obtained through testing. The number of the spoken language video synthesis tasks currently processed by the local synthesis server and the number of the spoken language video synthesis tasks backlogged by the current local task queue can be obtained by monitoring through message agent software.

And then, determining the current processing capacity V2 of the local composition server according to the obtained preset processing capacity V1 and the number N1 of the currently processed spoken language video composition tasks.

And then, according to the current processing capacity V2 and the number N2 of the accumulated spoken language video synthesis tasks of the current local task queue, determining the time length required by the local synthesis server to process the number N2 of the accumulated spoken language video synthesis tasks of the current local task queue.

Under the condition that the local synthesis server can process the number N2 of the spoken language video synthesis tasks accumulated in the current local task queue within a set first preset time period t1, namely N2/V2< t1, the number of the spoken language video synthesis tasks representing the current local task queue accumulation can be processed and completed, and the first probability that the target spoken language video synthesis tasks are added into the local task queue can be determined to be 100%.

Under the condition that the local synthesis server can only process the number N2 of the accumulated spoken language video synthesis tasks of the current local task queue beyond the set second preset time t2, namely N2/V2> t2, the accumulated spoken language video synthesis tasks of the current local task queue exceed the load capacity of the local synthesis server, and in order to ensure that the local synthesis server does not cause paralysis and the target video synthesis tasks can be processed in time, the first probability that the target spoken language video synthesis tasks are added into the local task queue is determined to be 0, so that the target video synthesis tasks are added into the cloud task queue.

Under the condition that the local synthesis server finishes processing the number N2 of the spoken language video synthesis tasks overstressed in the current local task queue between the time length t1 which is greater than the first preset time length t1 and the time length t2 which is less than the second preset time length t2, namely when t1< N2/V2< t2, the second probability of adding the determined target spoken language video synthesis task into the cloud end task queue is as follows: (N2/V2-t 1)/((t 2-t 1) × 100).

It should be noted that the first preset duration is shorter than the second preset duration, and the first preset duration and the second preset duration may be set according to actual conditions. This embodiment is not limited to this.

In step S202, in the case that the difference between the first probability and the second probability is smaller than the preset difference threshold, a random number is generated within a preset magnitude, and the first probability is converted into the preset magnitude to obtain a first value.

It should be noted that the preset difference threshold may be set to a smaller value less than 1, such as 0, or 0.1, or any value between 0 and 0.2, for characterizing the case that the first probability and the second probability are equal or similar. In a possible implementation manner, if the size of the difference between the first probability and the second probability is equal to or larger than a preset difference threshold, the target spoken language video synthesis task may be directly added to the task queue pointed to by the larger value of the first probability and the second probability. The method steps shown in fig. 2 mainly embody how to split the target spoken language video synthesis task for the case of equal or similar probability.

Specifically, the preset order of magnitude may be 2 or 3. This embodiment is not limited to this. When the preset magnitude is 2, the generated random number is any number of [0-100 ]. When the preset order of magnitude is 3, the generated random number is any number of [0-1000 ].

The conversion of the first probability into the predetermined order of magnitude yields the first value, specifically, the first probability is represented by a number corresponding to the predetermined order of magnitude (e.g., if the predetermined order of magnitude is 2, the number corresponding to the representation is 10) ² ) The multiplication is performed, and the result of the multiplication is the first value. Illustratively, if the predetermined order of magnitude is 2 and the first probability is 48%, then the first probability is converted to a first value of 48 within the predetermined order of magnitude.

In step S203, if the random number is less than or equal to the first value, the target spoken language video synthesis task is added to the local task queue, and if the random number is greater than the first value, the target spoken language video synthesis task is added to the cloud task queue.

Illustratively, following the above example, the preset order of magnitude is 2, the first probability is 48%, the first value is 48, and a random number is generated within 0-100, if the random number is any one of [0-48], the target spoken language video synthesis task is added to the local task queue, and if the random number is (48-100 ], the target spoken language video synthesis task is added to the cloud task queue.

Alternatively, step 202 may be: and under the condition that the difference value between the first probability and the second probability is smaller than a preset difference value threshold, generating a random number within a preset order of magnitude, and converting the second probability into the preset order of magnitude to obtain a second value. Step S203 may be: and if the random number is less than or equal to a second value, adding the target spoken language video synthesis task into a cloud task queue, and if the random number is greater than the second value, adding the target spoken language video synthesis task into a local task queue.

By adopting the technical scheme, the target task queue is determined by combining the generated random numbers according to the probability that the target spoken language video synthesis task is respectively added into the local task queue and the cloud task queue, so that the load balance between the local task queue and the cloud task queue is more reasonable, and the problems of low synthesis speed, collapse and the like of the local synthesis server due to a large number of tasks in the queue of the local synthesis server are avoided.

In one implementation, fig. 3 is another flowchart illustrating a distribution method of a spoken language video composition task according to an exemplary embodiment, as shown in fig. 3, including the following steps:

in step S301, in response to an operation of uploading a spoken language recording file by a client, the spoken language recording file is uploaded to a cloud storage server, where the spoken language recording file is obtained by the client collecting a voice of a user reading an original video.

In step S302, in a case where it is determined that uploading of the spoken language recording file to the cloud storage server fails, the spoken language recording file is uploaded to the local storage server.

In step S303, the file storage address of the spoken language recording file is returned to the client.

In the disclosed embodiment, the spoken video composition request includes a file storage address, and the spoken sound recording file is used to compose a spoken video with the original video. When the client receives the file storage address, the file storage address is carried in the spoken language video synthesis request, and then a target spoken language video synthesis task is generated, so that a local synthesis server or a cloud synthesis server can obtain a corresponding spoken language recording file according to the file storage address in the target spoken language video synthesis task, and then the synthesis of a spoken language video is completed.

By adopting the technical scheme, the spoken language recording file is uploaded to the cloud storage server, dependence on a network file system is avoided, the problem that all spoken language video synthesis tasks cannot be completed due to collapse of the network file system in the related technology is solved, and the stability of the whole system is improved. In addition, the local storage server is used as a backup memory of the cloud storage server, and the spoken language recording file is uploaded to the local storage server under the condition that the fact that the spoken language recording file is uploaded to the cloud storage server fails is determined, so that the relevant file can be conveniently acquired from the local storage server, the target spoken language video synthesis task can be successfully executed, and the reliability and the stability of the whole system are further improved.

In one embodiment, to prevent a spoken voice recording file from being unresponsive over a long period of time, the interface that uploads the spoken voice recording file is fuse degraded. Wherein, the feedback result of the server has not been received within a certain time, i.e. no response is represented.

In one embodiment, in order to solve the problem in the related art that the interface for synthesizing the spoken language video and the interface for uploading the spoken language recording file are set to be the same interface, which causes the interface to have a fault, that is, both services cannot be successfully processed, the present disclosure sets the interface for synthesizing the spoken language video and the interface for uploading the spoken language recording file to be mutually independent interfaces, so as to ensure that when a certain interface has a fault, the request or processing with the other interface is not affected.

In one embodiment, considering that the user does not submit the spoken video synthesis request in time, and because the files stored on the cloud storage server have a certain time limit, once the files have passed the validity period, the cloud storage server automatically clears the files. Therefore, in order to ensure that the target spoken language video synthesis task can be successfully executed, the spoken language recording file stored by the cloud storage server can be synchronously stored to the local storage server before the spoken language recording file stored by the cloud storage server fails, so that the local synthesis server or the cloud synthesis server can acquire the spoken language recording file corresponding to the spoken language video synthesis request, the synthesis of a spoken language video is completed, and the reliability and the stability of the whole system are further improved.

In one embodiment, the storage mode of the cloud storage server is divided into block storage, file storage and object storage according to the underlying storage mode. Public cloud storage, private cloud storage and hybrid cloud storage are classified according to storage types. Exemplarily, the storage mode of the cloud storage server is object storage, the basic unit of the object storage is an object, the object is the same as a file, and the object is a segment of data stream; unlike a file, an object does not have a hierarchy such as a directory. All objects are stored in the same level of the same flat space, all objects are equal and have no membership. In addition, the object storage has the advantages of infinite capacity, low cost, strong expansibility, convenient reading and writing, quick sharing and the like, so the object storage is very suitable for storing massive unstructured shared data.

In an embodiment, considering that a situation that a target spoken language video synthesis task cannot be successfully processed to obtain a spoken language video exists in a computation function of a cloud end synthesis server for synthesizing a spoken language video, when a target task queue is a cloud end task queue, a synthesis failure message sent by the cloud end synthesis server is received in response to adding the target spoken language video synthesis task into a local task queue, so that the local synthesis server can be ensured to process the spoken language video synthesis task of the cloud end synthesis server which cannot successfully synthesize the spoken language video, and the reliability and stability of the whole system are further improved.

In one embodiment, the spoken language video obtained after the target spoken language video synthesis task is completed is stored in the cloud storage server. And each target spoken language video synthesis task is processed by the local synthesis server or the cloud synthesis server to obtain a processing result, wherein the processing result comprises a video storage address of the spoken language video in the cloud storage server. And when the server acquires the processing result, updating the relational database management system according to the processing result.

By adopting the technical scheme, the client can acquire the video storage address from the relational database management system and acquire the corresponding spoken language video from the cloud storage server according to the video storage address.

In addition, when the spoken language video obtained after the target spoken language video synthesis task is processed cannot be stored in the cloud storage server, the spoken language video obtained after the target spoken language video synthesis task is processed is stored in the local storage server, so that the spoken language video is not lost, and the problem that the client cannot inquire the spoken language video is solved.

In one embodiment, fig. 4 is another flowchart illustrating a distribution method of a spoken language video composition task according to an exemplary embodiment, as shown in fig. 4, the method further includes the following steps.

In step S401, a video obtaining request sent by the client is obtained, where the video obtaining request includes a video storage address.

In step S402, the video acquisition request is sent to the content delivery server CDN corresponding to the cloud storage server, and the content delivery server CDN stores the spoken video when the client initially acquires the spoken video from the cloud storage server.

In step S403, a spoken language video corresponding to the video acquisition request sent by the content delivery server CDN is received, and the spoken language video is sent to the client.

It should be noted that the content delivery server CDN is a mechanism that avoids bottlenecks and links that may affect data transmission speed and stability on the internet as much as possible, so that content transmission is faster and more stable.

By adopting the technical scheme, the content delivery server CDN stores the spoken language video in the memory of the content delivery server CDN when the client initially acquires the spoken language video from the cloud storage server, so that when the client acquires the spoken language video again, the client can directly acquire the spoken language video from the content delivery server CDN, the content delivery server CDN does not need to access the cloud storage server or the local storage server, and the efficiency of acquiring the spoken language video and the stability of transmitting the spoken language video are improved.

In an implementation mode, considering that in the related art, after a user submits a spoken language recording file on a display interface, the user needs to synchronously wait for a processing result on the display interface, when a server does not return a result for a long time, a client side fails overtime, and the user waits for a long time and finally knows that the submission fails and then repeatedly submits, so that the previous task is not processed, the following task is continuously generated, the backlog of the task is more and more, and the whole link falls into endless loop. Therefore, in order to solve such problems, in the present disclosure, the local message queue and the cloud message queue are both asynchronous message queues, and a user does not need to wait on a submission interface, and the client quits the current interface and does not affect the synthesis of the spoken language video. In addition, when the user waits for the processing result in the submission interface and the processing result is not received in the preset time length, the client generates a prompt message for prompting the user to be in the processing process so as to avoid the repeated submission of the request by the client.

Illustratively, the local message queue may be a RabbitMQ queue and the cloud message queue may be a Kafka queue.

In one embodiment, the target spoken language video composition task includes a processing progress tracking parameter, and the method further includes: and responding to a progress query request of the client, and feeding back the processing progress of the target spoken language video synthesis task to the client according to the value of the processing progress tracking parameter.

By adopting the technical scheme, the value of the processing progress tracking parameter is used for representing the processing stage of the target spoken language video synthesis task, so that the processing stage of the target spoken language video synthesis task can be tracked according to the processing progress tracking parameter, and the stage which is not successfully processed for a long time can be conveniently and quickly positioned.

Based on the same inventive concept, the present disclosure also provides a spoken language video composition system, and fig. 5 is a block diagram of a spoken language video composition system according to an exemplary embodiment. As shown in fig. 5, the system 500 includes a PHP server 501, a local composition server 504, a offload controller 502, and a first communication interface 505 for communicating with a cloud composition server 506.

The PHP server 501 is configured to generate a target spoken language video synthesis task in response to receiving a spoken language video synthesis request sent by the client 508, and send the target spoken language video synthesis task to the streaming controller 502.

The flow distribution controller 502 adds the target spoken language video synthesis task to the target task queue based on preset flow distribution configuration rules.

It should be noted that the target task queue is the local task queue 503 or the cloud task queue 507, and the preset shunting configuration rule may send the target spoken language video synthesis task that cannot be processed by the local synthesis server 504 in time to the cloud task queue 507.

By adopting the technical scheme, the spoken language video synthesis task in the local task queue 503 is processed by the local synthesis server 504, and the spoken language video synthesis task in the cloud task queue 507 is processed by the cloud synthesis server 506. The flow distribution controller 502 performs load balancing on the local task queue 503 and the cloud task queue 507, a computing function of the cloud synthesis server 506 for synthesizing the spoken language videos is an elastic computing function, computing resources are dynamically used according to the number of spoken language video synthesis tasks to be processed by the cloud synthesis server 506, the situation that the spoken language video synthesis tasks are suddenly increased is solved, a large number of submitted spoken language video synthesis requests can be timely processed, a plurality of local synthesis servers 504 are not required to be arranged, and the problem of resource waste is solved; and because the calculation function is an elastic calculation function, the calculation function only charges according to the usage amount, and the enterprise expenditure cost is reduced compared with a mode of arranging a plurality of local synthesis servers.

In one embodiment, the add-drop controller 502 adds the spoken language video synthesis task to the target task queue based on the preset add-drop configuration rule in the following manner:

first, the streaming controller 502 determines a first probability of adding the target spoken language video synthesis task to the local task queue 503 and a second probability of adding the target spoken language video synthesis task to the cloud task queue 507 according to the streaming configuration rule.

Next, the shunt controller 502 generates a random number within a preset order of magnitude when a difference between the first probability and the second probability is smaller than a preset difference threshold, and converts the first probability into the preset order of magnitude to obtain a first value.

Then, if the random number is less than or equal to the first value, the streaming controller 502 adds the target spoken language video synthesis task to the local task queue 503, and if the random number is greater than the first value, the streaming controller 502 adds the target spoken language video synthesis task to the cloud task queue 507.

In one embodiment, the offloading configuration rule includes at least one or more of the following rules:

a first type of forking rules set for the user identification in the spoken video composition request and a second type of forking rules set for the load of the local composition server 504.

In one implementation, fig. 6 is another block diagram of a spoken language video composition system, shown according to an example embodiment, the system 500 further comprising a local storage server 510 and a second communication interface 509 for communicating with a cloud storage server 511. The PHP server 501 is further configured to, in response to an operation of uploading a spoken language recording file by the client 508, upload the spoken language recording file to the cloud storage server 511; uploading the spoken language recording file to the local storage server 510 upon determining that the uploading of the spoken language recording file to the cloud storage server 511 failed; the file storage address of the spoken sound recording file is returned to the client 508.

In fig. 6, in response to the operation of uploading the spoken voice recording file by the client 508, the PHP server 501 uploads the spoken voice recording file to the cloud storage server 511 through the second communication interface 509, and when the PHP server 501 cannot upload the spoken voice recording file to the cloud storage server 511, the PHP server 501 uploads the spoken voice recording file to the local storage server 510.

It should be noted that the spoken language recording file is obtained by the client 508 collecting the voice of the user reading the original video; the spoken video composition request includes a file storage address, and the spoken sound recording file is used to compose a spoken video with the original video.

By adopting the technical scheme, the spoken language recording file is uploaded to the cloud storage server 511, dependence on a network file system is avoided, the problem that all spoken language video synthesis tasks cannot be completed due to breakdown of the network file system in the related technology is solved, and the stability of the whole system is improved. In addition, the local storage server 510 is used as a backup memory of the cloud storage server 511, and when it is determined that uploading of the spoken language recording file to the cloud storage server 511 fails, the spoken language recording file is uploaded to the local storage server 510, so that the relevant file can be acquired from the local storage server 510, a target spoken language video synthesis task can be successfully executed, and reliability and stability of the whole system are further improved.

In one embodiment, the PHP server 501 is further configured to synchronously save the spoken voice record file stored by the cloud storage server 511 to the local storage server 510 before the spoken voice record file stored by the cloud storage server 511 expires.

By adopting the technical scheme, the spoken language recording file stored by the cloud storage server 511 can be synchronously stored in the local storage server 510 before the spoken language recording file stored by the cloud storage server 511 is invalid, so that the local synthesis server 510 or the cloud synthesis server 511 can acquire the spoken language recording file corresponding to the spoken language video synthesis request, thereby completing the synthesis of the spoken language video, and further improving the reliability and stability of the whole system.

In one embodiment, the storage manner of the cloud storage server 511 is object storage.

In an embodiment, in a case that the target task queue is the cloud task queue 507, the PHP server 501 is further configured to add the target spoken language video composition task to the local task queue 503 in response to receiving a composition failure message sent by the cloud composition server 506.

By adopting the technical scheme, the local synthesis server 504 can process the spoken language video synthesis task which cannot be successfully synthesized by the cloud synthesis server 506, and the reliability and stability of the whole system are further improved.

In one implementation, fig. 7 is another block diagram illustrating a spoken video composition system in accordance with an exemplary embodiment, the disclosure being further explained below in conjunction with fig. 7. As shown in fig. 7, the system further includes a Java server 512 and a relational database management system 513.

The client 508 sends a spoken language video synthesis request to the PHP server 501, the PHP server 501 generates a target spoken language video synthesis task, the streaming controller 502 adds the target synthesis task to a target task queue (a cloud task queue 507 or a local task queue 503) according to a streaming configuration rule, and the cloud synthesis server 506 and the local synthesis server 504 respectively process tasks in the respective task queues to obtain a synthesized spoken language video. The spoken language video obtained after the target spoken language video synthesis task is completed is stored in the cloud storage server 511, and the Java server 512 is further configured to:

acquiring a processing result aiming at the target spoken language video synthesis task, wherein the processing result comprises a video storage address of the spoken language video in the cloud storage server 511;

and updating the relational database management system 513 according to the processing result, so that the client 508 acquires the video storage address from the relational database management system 513 and acquires the corresponding spoken language video from the cloud storage server 511 according to the video storage address. Specifically, the client 508 obtains the video storage address from the relational database management system 513 through the PHP server 501.

By adopting the above technical solution, the client 508 may obtain the video storage address from the relational database management system 513 and obtain the corresponding spoken language video from the cloud storage server 511 according to the video storage address.

In addition, when the spoken language video obtained after the target spoken language video synthesis task is completed cannot be stored in the cloud storage server 511, the spoken language video obtained after the target spoken language video synthesis task is completed is stored in the local storage server 510, and similarly, the video storage address stored in the local storage server 510 is also updated in the relational database management system 513, so that the spoken language video is ensured not to be lost, and the problem that the client 508 cannot query the spoken language video is avoided.

In one embodiment, the local message queue 503 and the cloud message queue 507 are both asynchronous message queues, or the local message queue 503 is a RabbitMQ queue and the cloud message queue 507 is a Kafka queue.

By adopting the technical scheme, the user does not need to wait on the submission interface, and the synthesis of the spoken language video cannot be influenced when the client exits from the current interface.

In addition, when the user waits for the processing result in the submission interface and the processing result is not received in the preset time length, the client generates a prompt message for prompting the user to be in the processing process so as to avoid the repeated submission of the request by the client.

In one embodiment, the PHP server 501 is further configured to: acquiring a video acquisition request sent by the client 508, wherein the video acquisition request comprises a video storage address; sending the video acquisition request to a content delivery server (CDN) corresponding to the cloud storage server 511, where the content delivery server (CDN) stores the spoken video when the client 508 initially acquires the spoken video from the cloud storage server 511; and receiving a spoken language video corresponding to the video acquisition request sent by the content delivery server CDN, and sending the spoken language video to the client 508.

By adopting the technical scheme, the content delivery server CDN stores the spoken video in the content delivery server CDN when the client 508 initially acquires the spoken video from the cloud storage server 511, so that when the client 508 acquires the spoken video again, the client 508 can directly acquire the spoken video from the content delivery server CDN, and the content delivery server CDN does not need to access the cloud storage server 511 or the local storage server 510 any more, thereby improving the efficiency of acquiring the spoken video and the stability of transmitting the spoken video.

In an embodiment, the target spoken language video composition task includes a processing progress tracking parameter, a value of the processing progress tracking parameter is used to characterize a processing stage of the target spoken language video composition task, and the PHP server 501 is further configured to: responding to a progress query request of the client 508, and feeding back the processing progress of the target spoken language video synthesis task to the client 508 according to the value of the processing progress tracking parameter.

With respect to the various components of the system embodiments described above, the specific manner in which each component performs the operations has been described in detail in relation to the method embodiments and will not be elaborated upon here.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A distribution method of a spoken language video composition task, the method comprising:

adding the target spoken language video synthesis task into a target task queue based on a preset distribution configuration rule;

the target task queue is a local task queue or a cloud end task queue, spoken language video synthesis tasks in the local task queue are processed by a local synthesis server, the spoken language video synthesis tasks in the cloud end task queue are processed by a cloud end synthesis server, and a calculation function of the cloud end synthesis server for synthesizing spoken language videos is an elastic calculation function so as to dynamically use calculation resources according to the number of spoken language video synthesis tasks to be processed by the cloud end synthesis server;

adding the spoken language video synthesis task into a target task queue based on a preset shunting configuration rule, wherein the step of adding the spoken language video synthesis task into the target task queue comprises the following steps: determining a first probability of adding the target spoken language video synthesis task to the local task queue and a second probability of adding the target spoken language video synthesis task to the cloud task queue according to the distribution configuration rule; generating a random number within a preset order of magnitude under the condition that the difference between the first probability and the second probability is smaller than a preset difference threshold, and converting the first probability into the preset order of magnitude to obtain a first value; and if the random number is less than or equal to the first value, adding the target spoken language video synthesis task into the local task queue, and if the random number is greater than the first value, adding the target spoken language video synthesis task into the cloud task queue.

2. The method of claim 1, wherein the offload configuration rule is at least one of:

a first type of shunting rule set aiming at the user identification in the spoken language video synthesis request;

a second type of offloading rule set for a load of the local composition server.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

4. The method of claim 3, further comprising:

5. The method according to claim 3, wherein the storage manner of the cloud storage server is object storage.

6. The method according to claim 1 or 2, wherein in case that the target task queue is the cloud task queue, the method further comprises:

7. The method according to claim 1 or 2, wherein the spoken language video obtained after the target spoken language video synthesis task is completed is stored in a cloud storage server, and the method further comprises:

8. The method of claim 7, further comprising:

9. The method according to claim 1 or 2, wherein the local task queue and the cloud task queue are both asynchronous message queues, or the local task queue is a RabbitMQ queue and the cloud task queue is a Kafka queue.

10. The method according to claim 1 or 2, wherein the target spoken language video composition task includes a processing progress tracking parameter, a value of the processing progress tracking parameter is used for characterizing a processing stage of the target spoken language video composition task, and the method further comprises:

and responding to a progress query request of the client, and feeding back the processing progress of the target spoken language video synthesis task to the client according to the value of the processing progress tracking parameter.

11. A distribution system for a spoken language video composition task, comprising:

the system comprises a PHP server, a local synthesis server, a distribution controller and a first communication interface for communicating with a cloud synthesis server;

the target task queue is a local task queue or a cloud task queue, spoken language video synthesis tasks in the local task queue are processed by a local synthesis server, the spoken language video synthesis tasks in the cloud task queue are processed by a cloud synthesis server, and a computing function of the cloud synthesis server for synthesizing spoken language videos is an elastic computing function so as to dynamically use computing resources according to the number of the spoken language video synthesis tasks to be processed by the cloud synthesis server;

the distribution controller adds the spoken language video synthesis task to the target task queue based on the preset distribution configuration rule in the following way:

the flow distribution controller determines a first probability of adding the target spoken language video synthesis task into the local task queue and a second probability of adding the target spoken language video synthesis task into the cloud task queue according to the flow distribution configuration rule;

the shunt controller generates a random number within a preset order of magnitude under the condition that the difference value between the first probability and the second probability is smaller than a preset difference threshold value, and converts the first probability into the preset order of magnitude to obtain a first value;

if the random number is smaller than or equal to the first value, the shunting controller adds the target spoken language video synthesis task to the local task queue; and if the random number is larger than the first value, the flow distribution controller adds the target spoken language video synthesis task into the cloud task queue.

12. The system of claim 11, further comprising a local storage server and a second communication interface for communicating with a cloud storage server;

13. The system according to claim 12, wherein the spoken language video obtained after the target spoken language video synthesis task is completed is stored in the cloud storage server, the system further comprising a Java server, and the Java server is further configured to:

14. The system of claim 13, wherein the PHP server is further configured to: