WO2010045736A1

WO2010045736A1 - Reduced-latency rendering for a text-to-movie system

Info

Publication number: WO2010045736A1
Application number: PCT/CA2009/001521
Authority: WO
Inventors: Herve Lange
Original assignee: Xtranormal Technology Inc.
Priority date: 2008-10-22
Filing date: 2009-10-22
Publication date: 2010-04-29

Abstract

There is described a method and system for creating a video. The method comprises: receiving an initial text divided into at least two initial parts; rendering initial sub-videos, each one of the initial sub-videos being a visual representation of each one of the at least two initial parts of the initial text; combining the initial sub-videos together to generate the video; receiving a modified text of the initial text, the modified text comprising a modification; comparing the modified text with the at least two initial parts of the initial text to determine a modified part of the modified text corresponding to one of the at least two initial parts of the initial text, the modified part comprising the modification; rendering a modified sub-video for the modified part; combining the modified sub-video with at least one of the initial sub-videos corresponding to an unmodified part of the initial text, to generate a modified version of the video.

Description

REDUCED-LATENCY RENDERING FOR A TEXT-TO-MOVIE SYSTEM CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from US provisional patent application 61/107,568 filed October 22, 2008 and entitled "REDUCED-LATENCY RENDERING FOR A TEXT-TO-MOVIE SYSTEM".

TECHNICAL FIELD

[0002] The present disclosure relates to the field of digital animation techniques. More specifically, it relates to the field of text-to-movie/video rendering techniques.

BACKGROUND

[0003] A Text-To-Movie (TTM) or Text-To-Animation (TTA) system converts a text inputted by a user into a movie or an animation. The last step of the process for creating the movie is the rendering of the movie or animation from the input text, which consists in generating the images constituting the movie. The rendering process is a very computationally consuming step. To increase the speed of rendering a project, it is often broken up into jobs and the jobs are sent to a corresponding node of the rendering farm in order to be processed in parallel. A rendering farm refers to a computer cluster for rendering computer generated images. Each computer or node of the rendering farm renders a video from a corresponding received job and the different rendered videos are then combined together in order to obtain a single video representative of the input text.

[0004] The existing techniques of splitting a project and distributing it on a rendering farm provide great speed and efficiency in rendering a project a single time. However, simply distributing jobs on a rendering farm is insufficient for the purposes of a TTM or TTA system where a project needs to be recursively rendered as the user usually modifies the text and re-renders the movie in an iterative manner. The process of continually re-rendering the text can become very tedious for the user since the rendering process is usually quite slow.

[0005] There is therefore a need to provide a method for reducing the rendering latency in the context of TTM or TTA systems. SUMMARY

[0006] According to an embodiment, there is provided a method for creating a video. The method comprises: receiving an initial text divided into at least two initial parts; rendering initial sub-videos, each one of the initial sub-videos being a visual representation of each one of the at least two initial parts of the initial text; combining the initial sub-videos together to generate the video; receiving a modified text of the initial text, the modified text comprising a modification; comparing the modified text with the at least two initial parts of the initial text to determine a modified part of the modified text corresponding to one of the at least two initial parts of the initial text, the modified part comprising the modification; rendering a modified sub-video for the modified part; combining the modified sub-video with at least one of the initial sub- videos corresponding to an unmodified part of the initial text, to generate a modified version of the video; and displaying the modified version of the video to present a visually representation of the modification in the modified text.

[0007] In accordance with an embodiment, there is also provided a system for creating a video. The system comprises a text analyzer for receiving an initial text divided into at least two initial parts, and sending rendering jobs each comprising one of the at least two initial parts; a rendering farm in operative communication with the text analyzer for receiving and distributing the rendering jobs, and for rendering initial sub-videos, each one of the initial sub-videos visually representing each one of the at least two initial parts; and a sub-video combiner in operative communication with the rendering farm, for combining the initial sub-videos together to generate the video, the video visually representing the initial text; wherein the text analyzer is adapted to: receive a modified text of the initial text, the modified text comprising a modification; compare the modified text to the initial text to identify a modified part comprising the modification, and at least one unmodified part corresponding to one of the at least two initial parts; and send a new rendering job comprising the modified part, to the rendering farm; wherein the rendering farm is adapted to output a modified sub-video for the modified part; and wherein the sub-video combiner is adapted to combine the modified sub-video with at least one of the initial sub-videos corresponding to the at least one unmodified part, to generate a modified version of the video. [0008] In accordance with another embodiment, there is provided a method for modifying a video initially rendered from a text. The method comprises receiving a modification to the text, the text comprising at least two parts, each of the at least two parts being associated to at least two respective sub-videos forming the video; determining a modified part and an unmodified part amongst the at least two parts, the modified part comprising the modification; rendering a modified sub-video for the modified part; combining the modified sub-video with one of the at least two respective sub-videos corresponding to the unmodified part, to generate a modified final video; and displaying the modified final video to visually represent the modification made to the text.

[0009] In accordance with yet another embodiment, there is provided a system for modifying a video initially rendered from a text, the system comprising: a processor; and a memory in operative communication with the processor, the memory comprising instructions for implementing the processor to: receive a modification to the text, the text comprising at least two parts, each of the at least two parts being associated to at least two respective sub-videos forming the video; determine a modified part and an unmodified part amongst the at least two parts, the modified part comprising the modification; render a modified sub-video for the modified part; combine the modified sub-video with one of the at least two respective sub-videos corresponding to the unmodified part, to generate a modified final video; and output the modified final video which visually represents the modification made to the text.

[0010] The term "video" should be understood as any form of motion pictures. A video can be a film, a 2D animation, a 3D animation, an animated cartoon, and the like. A video can have an audio frame or be silent.

[0011] The term "modified part of text" should be understood as a part of a text that has been directly and/or indirectly modified by a user or by any text modification tool, and that the TTM system considers as being modified.

[0012] The term "unmodified part of text" is a part of text that has not been directly and/or indirectly modified, and that the TTM system considers as being unmodified. For example, an unmodified part of text can be a part of text which has been directly modified by the user, but which is considered or tagged as unmodified by the TTM system. [0013] The term "sub-video" is used to describe a video which corresponds to a part of a text. A sub-video is a visual representation of its corresponding part of text. A sub-video may have an audio track or be silent. Sub-videos corresponding to different parts of the text are combined together in order to obtain a final video. The final video is a visual representation of the whole text.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] Further features and advantages of the present disclosure will become apparent from the following detailed description, taken in combination with the appended drawings, in which:

[0015] Fig. 1 is a flow chart illustrating a method for creating a video from a written text, in accordance with an embodiment;

[0016] Fig. 2A illustrates an input of an initial text divided into three parts, in a TTM system, in accordance with an embodiment;

[0017] Fig. 2B illustrates an input of a modified text of which only one part has been modified, in a TTM system, in accordance with an embodiment;

[0018] Fig. 2C illustrates an input of only one part of text that has been modified, in a TTM system, in accordance with an embodiment;

[0019] Fig. 3 is a block diagram illustrating a system for creating a video from an input text, in accordance with an embodiment;

[0020] Fig. 4 is a block diagram illustrating a text analyzer, in accordance with an embodiment.

[0021] It will be noted that throughout the appended drawings, like features are identified by like reference numerals.

DETAILED DESCRIPTION

[0022] Figure 1 illustrates an embodiment of a method 300 for creating a video according to an input text. The rendering of the video occurs in the context of a TTM or TTA system. The first step 302 of the method is the reception of an initial text inputted by a user. This initial text is the basis for the creation of the video. The video to be rendered is a visual representation of the text and can comprise an audio frame. The initial text can be a written text or an oral text subsequently converted into a written text by the system. As such audio data can be received and then converted into text data representative of the initial text.

[0023] The second step 304 of the method consists in dividing the initial input text into at least two parts. Each part of the text is considered to be a job (also referred to herein as a rendering job). Numerous strategies may be used to divide the initial text into parts. For example, each sentence of the initial text may be considered as a job. In another example, the system analyses the initial text and identifies actions. Each action is then considered as being a job. For example, the sentence "John walks to his bed and lies down" comprises two actions, namely the action "walking" and the action "lying down". The sentence is then divided into two parts, a first part corresponding to the action " walking" and the second part corresponding to the action "lying down". Each of the two parts is considered by the system as being a job. Any strategy or method for dividing a text into parts known by a person skilled in the art may be used.

[0024] In one embodiment, the second step 304 of the method, namely dividing the initial text into parts of text, is performed prior to the reception of the text. In one case, the user of the method 300 is in charge of the division of the text into parts; step 304 is optional. The user may insert separator markers within the text in order to divide the text and thus define a position where each text part starts and ends. For example, markers such as "/", "•", and the like, can be inserted within the text by the user to create the different parts of text. In another case, a user interface used by a user to input the text offers a natural division of the text, whereby individual data entry fields are presented to the user for entering text. In this way, delimitations as per the data entry fields provide for the separator markers. For example, different text boxes or blocks in which the text is written are provided to the user on the interface. A text box may be associated with a chapter or a scene of the story, for example. The user describes a first scene by writing text in a first text box and describes a second scene by writing text in a second text box. Each text box of the interface is dedicated to a corresponding scene or chapter of the story to be animated. Alternatively, each text box may have a limited space in which the text is entered. A limited number of characters may be entered in each text box, for example. In this case, the user starts writing a story in the first text box and if the limit of characters of the first box is reached, the user continues writing the story in the second box, etc. It should be understood that any method for dividing a text before being sent to the TTM or TTA system may be used.

[0025] In step 306, an initial sub-video is rendered for each part of the initial text. Each job corresponding to a part of the initial text is then sent to a corresponding node on a rendering farm and an initial sub-video is rendered for each job. Each initial sub-video is a visual representation of a corresponding part of the initial text. The initial sub-videos are stored in a memory and then combined together in order to create a final video at step 308. The final video is a visual representation of the whole initial text, i.e. an animation or movie. The final video is then made available to the user. After watching the final video, the user may decide to modify the story by modifying the initial text.

[0026] In one embodiment, the generated initial sub-videos are silent videos. An audio frame is generated separately from the sub-videos. The audio frame is then combined with the sub-videos during the creation of the final video. Alternatively, each initial sub-video may comprise an audio frame. An audio frame of a sub-video is referred to herein as a sub-audio frame.

[0027] The next step 310 of the method illustrated in Figure 1 is the reception of a modified text from the user. In an embodiment in which an undivided text is received, the modified text corresponds to the initial text plus the modifications directly made by the user. In one embodiment, the system divides the modified text into several parts at step 312, in accordance with the parts of the initial text, and compares each part of the modified text to its corresponding part of the initial text in order to identify the parts of the modified text which contain modifications directly made by the user.

[0028] In an embodiment in which a divided text is received, the reception of a modified text may include the reception of the whole text or the reception of only the parts of text which have been directly modified by the user. As illustrated in Figure 2A, the whole initial text 10 is sent to the TTM or TTA system 12. For example, the whole initial text 10 is divided into three initial parts 14, 16, and 18 before being sent to the TTM system 12. In one embodiment, the whole modified text 20 is sent to the TTM system 12, as illustrated in Figure 2B. The modified text 20 is also divided into three parts 14, 18, and 22 according to the parts of the initial text 10. For example, only the second part 22 of the modified text 20 has been modified by the user. In this embodiment, the whole modified text 20 comprising the parts of text 14, 18, and 22, is sent to the system 12. In another embodiment, only the modified part of text 22 is sent to the system 12, as illustrated in Figure 2C.

[0029] Once the modifications to the text have been received, the next step is the determination of the parts of the text that have been directly modified by the user. If the whole modified text is received, this is done by comparing it to the previously received initial text stored in memory. If only the modified parts of text are received, then all of the received parts of text are considered as being directly modified by the user and the missing parts of text are retrieved from the initial text stored in memory.

[0030] In one embodiment, each part of the modified text comprising modifications directly made by the user is tagged as being a modified part of text. Similarly, each part of the modified text which comprises no modification made by the user is tagged as an unmodified part of text.

[0031] In one embodiment, the method 300 comprises the step of analyzing the importance of the modifications made to the initial text in order to determine if they will have impact on the video. The method also comprises the step of determining if the modifications have an impact on parts of the text that have not been directly modified by the user. For example, if the user has changed the name of a character in his story from "John" to "Peter" in a part of the text, it may have no impact on the visual representation of the entity in the final video. In another example, if the system is configured in accordance with a given pre-set representation parameter to represent any male character with black hair, naming the character "John" or "Peter" will not impact the visual representation of the character. Therefore, in this case, even if part of the text comprises a modification in comparison to its corresponding initial part of text, namely "Peter" instead of "John", this part of text is considered or tagged as being unmodified. In another example, the system only comprises a single animated representation for any model of sports car. Therefore, replacing the word "Mustang" by the word "Ferrari" in a part of the text would not modify the final video. As a result, the part of text in which the word "Mustang" has been replaced by the word "Ferrari" is tagged as being an unmodified part of text.

[0032] In one embodiment, a part of text in which the user has entered no modification may be tagged as a modified part of text if it is indirectly modified by a change made by the user in another part of text. For example, in a first part of the initial text, there is the sentence "Peter is sleeping on his bed" and in a second part of the initial text, there is the sentence "Peter wakes up and reads a book". Watching the final video resulting from this text, the user sees Peter sleeping, waking up and then reading a book while lying on his bed. If the user consequently replaces the first sentence by the sentence "Peter is sleeping on his chair" in the first part of the text but does not modify the second sentence, namely "Peter wakes up and reads a book", the system considers both the first and the second parts of the text as being modified even if the user has not directly modified the second sentence. The system determines and understands that the action "Peter wakes up and reads a book" occurs while Peter is sitting on a chair and not while he is lying on his bed. In this case, the second part of the text while being unmodified by the user is tagged as a modified part of text.

[0033] In one embodiment, a graphic structure of the inputted text is generated in order to determine if a part of the text directly modified by the user should be tagged as unmodified and if parts of the text that have not been directly modified by the user should be tagged as modified. A graphic structure represents the interconnections between the actions of the entities in the story and the state of the animation world. If a modification directly made by the user in the text does not affect the graphic structure, then the part of the text comprising the modification is tagged as being unmodified. If a modification directly made by the user implies a modification to the graphic structure, the system analyses the whole graphic structure in order to determine if this modification will give rise to additional modifications to the remainder of the graphic structure. If it is the case, the parts of text associated with the additional modifications to the graphic structure are identified and tagged as being modified parts of text. In order to determine if a modification to the graphic structure will generate other additional modifications to the graphic structure, the method may take into account synchronization rules. The synchronization rules express the temporal relations that can exist between two or more actions. Here are some examples of synchronization rules: action 1 and action 2 always occur simultaneously, action 1 initiates action 2, action 1 terminates action 2, action 1 and action 2 are mutually exclusive, action 1 is a sub-action of action 2, action 1 always precedes action 2, action 1 always follows action 2, action 1 clips action 2, action 1 unclips action 2, action 1 and action 2 have to be fully synchronized, and the like. By considering the synchronization rules, it is possible to determine if modifying an action associated with a part of text generates modifications to other actions associated with the same or another part of text.

[0034] While the present description refers to the use of a graphic structure and synchronization rules in order to determine if a modification to a part of the text would generate other modifications, it should be understood that any method known by a person skilled in the art can be used.

[0035] Referring back to Figure 1 , at step 312, a modified sub-video is rendered for each one of the parts of text tagged as being modified. The modified sub-videos are then stored in memory. There is no rendering of a sub-video for the parts of text that are tagged as being unmodified since they have already be rendered during the creation of the final video associated with the initial text. The last step 314 of the method 300 illustrated in Figure 1 is the generation of a modified final video by combining the modified sub-videos corresponding to modified parts of text with the previously stored initial sub-videos corresponding to unmodified parts of text. By only re-rendering the sub-videos from the modified parts of text, the period of time required to create the final video is shortened. The modified final video is made available to the user which can modify the text again. The same process occurs for each modification to the text entered by the user in an iterative manner. For each iteration, only the parts of text considered as modified with respect to the text of the previous iteration lead to a rendering process to obtain modified sub-videos. These sub-videos replace the sub-videos which correspond to the same parts of text and which are stored in memory.

[0036] It should be understood that the method illustrated in Figure 1 may be executed by a machine provided with a processor and a memory. The processor is then configured to execute all of the steps of the methods. The machine may be a personal computer. Alternatively, the user may be provided with an interface to input the initial text and the modified text, and communication means to send the initial and modified text to a server via an Internet, for example. The server is provided with communication means; a memory; and a processor adapted to perform the steps of the method illustrated in Figure 1 , and sends the final video to the user. [0037] Figure 3 illustrates one embodiment of a system 50 for generating a video from a text. The system comprises a text analyzer 52, a memory 54 (aka memory device), a rendering farm 56, and a sub-video combiner 58. The system 50 is connected to a user interface 60 comprising a display unit 62.

[0038] The user interface 60 is used by a user of the system 50 to input an initial text and any subsequent modifications to the initial text. The user interface 60 is in operative communication with the text analyzer 52.

[0039] In one embodiment, the text analyzer 52 comprises a text divider 70, a text modifications locator 72, a tag generator 74, and a job generator 76, as illustrated in Figure 4. The text divider 70 receives a single piece of text from the user interface 60 and is adapted to divide the initial text received as a single piece of text into multiple parts.

[0040] In another embodiment, the text analyzer does not contain any text divider 70 and the text is divided into parts before being sent to the text analyzer 52. The user interface 60 is adapted to send the whole initial text inputted by the user and divided into several parts to the text analyzer 52. After the user has made modifications to the initial text, the whole text divided into parts is sent again to the text analyzer 52. Alternatively, the user interface 60 first sends the whole initial text but subsequently it only sends the parts of text that have been modified by the user. As above described with respect to Figure 1 , separator markers, text boxes or other means may be used to divide the text.

[0041] Referring back to Figure 3, the text analyzer 52 saves the parts of the initial text in the memory 54 and also creates a job per part of initial text. The text analyzer 52 sends the jobs to the rendering farm 56 which associates each job to a respective node (aka rendering node) (not shown). Hence there may be multiple (at least two) rendering nodes. Each job corresponding to a part of the initial text is sent to its respective node in the rendering farm 56. The rendering farm 56 outputs a rendered initial sub-video for each part of the initial text. Each initial sub-video is saved in the memory 54 and also sent to the sub-video combiner 58. The sub-video combiner 58 is adapted to combine the initial sub-videos together to generate a final video being a visual representation of the initial text. [0042] The text analyzer 52 is further adapted to receive modifications to the initial text from the user interface 60.

[0043] The modifications should be understood as being the whole undivided initial text comprising modifications, the initial text divided into parts of which a certain number have been modified, or only the parts of the initial text that have been modified by the user. The text modifications locator 72 compares the modified text to the initial text saved into memory 54 in order to identify the modifications to the text directly made by the user.

[0044] In one embodiment in which the text analyzer 52 receives undivided text, the text divider 70 i s adapted to divide the modified text according to the divisions previously made to the initial text. This results in directly modified and unmodified parts of text.

[0045] The text analyzer 52 further comprises the tag generator 74 which is adapted to tag each received part of text as being modified or unmodified. The parts of text tagged as modified are saved in the memory 54 in which they can be substituted for their corresponding parts of initial text. The tag generator 74 can be adapted to perform the steps above described with respect to Figure 1 in order to tag the modified parts of text. For example, the tag generator 74 can be adapted to generate graphic structures and to use synchronization rules in order to tag the parts of text. In one embodiment, the text analyzer 52 does not save the received text in memory 54 but it only stores the graphic structures corresponding to the received text.

[0046] The job generator 76 of the text analyzer 52 creates a job for each one of the parts of text tagged as being modified and then sends them to the rendering farm 56. The rendering farm 56 assigns each job to a respective node where a modified sub- video is generated for each part of text tagged as modified. These modified sub- videos are saved in memory 54 and sent to the sub-video combiner 58. The sub- video combiner 58 retrieves the initial sub-videos corresponding to the parts of text tagged as being unmodified from the memory 58 and combines them with the modified sub-videos in order to create a modified final video. The sub-video combiner 58 may also combine the sub-videos with an audio frame.

[0047] The modified final video is sent to the display unit 62 of the user. Alternatively, it can be sent and stored in the memory of the user's computer. [0048] While in the present description, a job is associated with a single part of text, it should be understood that a job can be associated with more the one parts of text. For example, two parts of text can be regrouped and constitute a job.

[0049] It should be noted that the present disclosure can be carried out as a method, can be embodied in a system, a computer readable medium or an electrical or electro-magnetic signal.

[0050] While preferred embodiments have been described above and illustrated in the accompanying drawings, it will be evident to those skilled in the art that modifications may be made therein without departing from the essence of this disclosure. Such modifications are considered as possible variants comprised in the scope of the disclosure.

Claims

CLAIMS:

1. A method for creating a video, comprising: receiving an initial text divided into at least two initial parts; rendering initial sub-videos, each one of the initial sub-videos being a visual representation of each one of the at least two initial parts of the initial text; combining the initial sub-videos together to generate the video; receiving a modified text of the initial text, the modified text comprising a modification; comparing the modified text with the at least two initial parts of the initial text to determine a modified part of the modified text corresponding to one of the at least two initial parts of the initial text, the modified part comprising the modification; rendering a modified sub-video for the modified part; combining the modified sub-video with at least one of the initial sub-videos corresponding to an unmodified part of the initial text, to generate a modified version of the video; and displaying the modified version of the video to present a visually representation of the modification in the modified text.

2. The method of claim 1 , wherein the modified text comprises the initial text and the modification, the method comprising dividing the modified version into at least two parts, the at least two parts corresponding to the at least two initial parts, and wherein the comparing comprises comparing each one of the at least two parts with a corresponding one of the at least two initial parts to identify the modified part of the modified text which contains the modification.

3. The method of claim 1 , wherein the initial text comprises undivided text, and wherein the method comprises dividing the initial text into the at least two initial parts.

4. The method of claim 1 , wherein the receiving of the initial text comprises receiving audio data, and converting the audio data into text data which corresponds to the initial text.

5. The method of claim 1 , wherein the dividing the initial text into the at least two initial parts comprises identifying actions in the initial text, the at least two initial parts conveying a respective one of the actions.

6. The method of claim 1 , wherein the dividing the initial text into the at least two initial parts comprises identifying separator markers in the initial text, the separator markers identifying a start and end position for each one of the at least two initial parts in the initial text.

7. The method of claim 1 , wherein the rendering the initial sub-videos comprises sending at least two rendering jobs to a rendering farm, the at least two rendering jobs each comprising one of the at least two initial parts, the rendering farm comprising at least two nodes for each rendering the initial sub-videos from the at least two initial parts.

8. The method of claim 1 , comprising storing the initial sub-videos in a memory device prior to the combining of the initial sub-videos.

9. The method of claim 1 , wherein the initial sub-videos are silent and wherein the combining the initial sub-videos together further comprises combining initial sub- audio frames together, each of the initial sub-audio frames being respectively associated to each one of the initial sub-videos.

10. The method of claim 1 , wherein the modified text is the modified part, the method comprising retrieving the unmodified part from the initial text.

11. The method of claim 1 , comprising analyzing an importance of the modification on the visual representation.

12. The method of claim 11 , comprising determining an impact of the modification on the unmodified part of the initial text which does not contain the modification, based on a pre-set representation parameter.

13. The method of claim 12, comprising tagging the unmodified part as a second modified part based on one of the impact and the importance, the second modified part being indirectly modified via the modification in the modified part.

14. The method of claim 1 , comprising generating a graphic structure of the initial text; and detecting an indirect modification in the unmodified part based on the graphic structure.

15. The method of claim 14, wherein the detecting the indirect modification comprises analyzing a temporal relation between at least two actions defined by the initial text.

16. A system for creating a video, comprising: a text analyzer for receiving an initial text divided into at least two initial parts, and sending rendering jobs each comprising one of the at least two initial parts; a rendering farm in operative communication with the text analyzer for receiving and distributing the rendering jobs, and for rendering initial sub-videos, each one of the initial sub-videos visually representing each one of the at least two initial parts; and a sub-video combiner in operative communication with the rendering farm, for combining the initial sub-videos together to generate the video, the video visually representing the initial text; wherein the text analyzer is adapted to: receive a modified text of the initial text, the modified text comprising a modification; compare the modified text to the initial text to identify a modified part comprising the modification, and at least one unmodified part corresponding to one of the at least two initial parts; and send a new rendering job comprising the modified part, to the rendering farm; wherein the rendering farm is adapted to output a modified sub-video for the modified part; and wherein the sub-video combiner is adapted to combine the modified sub-video with at least one of the initial sub-videos corresponding to the at least one unmodified part, to generate a modified version of the video.

17. The system of claim 16, comprising a user interface in operative communication with the text analyzer, for inputting the initial text and the modified text.

18. The system of claim 17, wherein the user interface comprises a display unit for displaying at least one of the video and the modified version of the video.

19. The system of claim 16, wherein the user interface comprises a text divider for dividing the initial text into the at least two initial parts; for dividing the modified text into at least two parts corresponding to the at least two initial parts.

20. The system of claim 16, wherein the rendering farm comprises multiple rendering nodes, each of the rendering nodes for executing one of the rendering jobs and the modification rendering job.

21. The system of claim 16, wherein the sub-video combiner is adapted to retrieve the initial sub-videos corresponding to the at least one unmodified part.

22. The system of claim 16, wherein the text analyzer comprises a text divider for dividing the initial text into the at least two initial parts; for dividing the modified text into at least two parts corresponding to the at least two initial parts.

23. A method for modifying a video initially rendered from a text, the method comprising: receiving a modification to the text, the text comprising at least two parts, each of the at least two parts being associated to at least two respective sub-videos forming the video; determining a modified part and an unmodified part amongst the at least two parts, the modified part comprising the modification; rendering a modified sub-video for the modified part; combining the modified sub-video with one of the at least two respective sub- videos corresponding to the unmodified part, to generate a modified final video; and displaying the modified final video to visually represent the modification made to the text.

24. A system for modifying a video initially rendered from a text, the system comprising: a processor; and a memory in operative communication with the processor, the memory comprising instructions for implementing the processor to:

receive a modification to the text, the text comprising at least two parts, each of the at least two parts being associated to at least two respective sub-videos forming the video; determine a modified part and an unmodified part amongst the at least two parts, the modified part comprising the modification; render a modified sub-video for the modified part; combine the modified sub-video with one of the at least two respective sub- videos corresponding to the unmodified part, to generate a modified final video; and output the modified final video which visually represents the modification made to the text.