CN1409213A

CN1409213A - Book makign system and method

Info

Publication number: CN1409213A
Application number: CN 01141820
Authority: CN
Inventors: 吴昌隆
Original assignee: LIXIN INTERNATIONAL SCIENCE AND TECHNOLOGY Co Ltd
Current assignee: LIXIN INTERNATIONAL SCIENCE AND TECHNOLOGY Co Ltd
Priority date: 2001-09-19
Filing date: 2001-09-19
Publication date: 2003-04-09
Anticipated expiration: 2021-09-19
Also published as: CN1202471C

Abstract

The book making system which produces the books with text part and picture part consists of a video receiving module to receive an original video information, a decoding module to decode the original video information into a video information, a text picking module to pick up the text part from the video information as per the making policy, a picture picking module to pick up at least one key frame from the video information as a picture part as per the making policy, a book producing module to compose the picked text and picture parts to produce the book. The method of book making related to above-mentioned system also opens to the public in the present invention.

Description

Book makign system and method

Technical field

The present invention relates to a kind of book makign system and method, particularly a kind of book makign system and method for utilizing a computer software analysis one video source (Video) to paint book files such as basis, picture album, caricature, e-book with automatic generation.

Background technology

According to present technology, when generally painting books such as basis, picture album, caricature, e-book in making, the source of its content still utilizes artificial the drawing usually, or edits arrangement by computing machine one by one at individual image, so that be assembled into book.

Yet, universal day by day along with box (SetupBox), electronic information products such as DVD, VCD on digital camera, TV card (TV Tuner Card), the machine, the user can obtain digital video at an easy rate, therefore, utilize the Computer Processing video source to produce book file, become the important application and the demand of field of computer multimedia.

As mentioned above, when resulting image data was not the video source of individual image but continuous image, the user must be resolved into the video source of continuous image many images, can be bound into book form at this image editing arrangement by computing machine then.Yet, for general video content (Video Content), in the NTSC standard, it may be the continuous switching that comprises 29.97 images that its group put a second, and in the PAL standard, its group put the continuous switching that may comprise 25 images a second, so, long video content just had 1500～1800 images in one minute, if the user edits each image one by one, will be a thing very consuming time and inefficent.

Therefore, how can utilize video content to produce efficiently and to paint book files such as basis, picture album, caricature, e-book, just when previous important problem.

Summary of the invention

In order to overcome the deficiencies in the prior art, the purpose of this invention is to provide a kind of book makign system and method, it can be analyzed a video source automatically and paint book files such as basis, picture album, caricature, e-book with generation.

For achieving the above object, book makign system of the present invention is to be used for producing books that comprise a word segment and illustration part, and comprises a video reception module, a decoder module, a literal acquisition module, an illustration acquisition module and a books generation module.In the present invention, the video reception module receives a former video data, decoder module is decoded former video data to obtain a video data, and former video data can be any one video format, the literal acquisition module is then made policy according to one and obtain word segment in the video data, the illustration acquisition module then capture at least one key picture (key frame) with as the illustration part according to making policy in the video data, obtained word segment and the illustration of books generation module foundation partly produces books then.

In addition, also comprise an editor module, a figure book format (template) selection module and a making policy selection module according to book makign system of the present invention.In the present invention, the making policy is selected module to accept a user and is selected required making policy, editor module receives user's operation so that the content of books is edited, the figure book format is selected module to receive the user and is selected required at least one figure book format, and the books generation module is just applied mechanically selected figure book format and set type word segment and illustration part with the generation books.

As mentioned above, make policy select module the making policy that can select comprise an audio frequency (audio) analytical algorithm then, captions (caption) analytical algorithm then, one scene/shot transition analytical algorithm then and an image analysing computer algorithm, wherein, the audio analysis algorithm is a kind of algorithm of audio data of analysis video data; The captions analytical algorithm then is a kind of algorithm of captions data of analysis video data; Scene/shot transition analytical algorithm then is a kind of algorithm of scene/shot transition data of analysis video data; The image analysing computer algorithm is a kind of algorithm of image data of analysis video data, and its analysis that an image data and an image example data that provides in advance can be compared, or with the analysis that compares of image data and the object data that provides in advance, or analyze a subtitle image data in the image data.

Therefore, literal acquisition module and illustration acquisition module can according to above-mentioned audio analysis algorithm, captions analytical algorithm then, scene/shot transition analytical algorithm then or the image analysing computer algorithm obtain data such as making required word segment of books and illustration part, then, the books generation module partly is inserted in above-mentioned word segment and illustration in the figure book format, paints book files such as basis, picture album, caricature, e-book so just produce automatically.

The present invention also provides a kind of books method for making, and it comprises that a video reception step, a decoding step, literal acquisition step, illustration acquisition step and books produce step.In the present invention, the video reception step receives former video data earlier, then decoding step is decoded former video data to obtain video data, literal acquisition step and illustration acquisition step capture in the video data respectively and make required word segment and the illustration part of books then, and last books produce step and partly produce books according to word segment and illustration.

In addition, comprise also that according to books method for making of the present invention an edit step selects step so that the user chooses required figure book format with the content, the figure book format that produce the postedit books in books, and then allow books produce step to apply mechanically this figure book format and produce books and and make policy and select module so that the user chooses required making policy.

Advantage of the present invention is: because book makign system of the present invention and method can be analyzed a video source automatically, and cooperation various video form, and integrate technology such as video content analysis, text-recognition, voice recognition, produce book files such as painting basis, picture album, caricature, e-book, so can utilize video content to produce book file efficiently.

Description of drawings

The present invention is described in detail below in conjunction with drawings and Examples:

Fig. 1 is a synoptic diagram, shows the structure according to the book makign system of preferred embodiment of the present invention;

Fig. 2 is a process flow diagram, shows the flow process according to the books method for making of preferred embodiment of the present invention;

Fig. 3 is a synoptic diagram, is shown in the synoptic diagram of acquisition key picture in the books method for making of preferred embodiment of the present invention.

Symbol description among the figure:

1 book makign system

101 video reception modules

102 decoder modules

103 make policy selects module

104 literal acquisition modules

105 illustration acquisition modules

106 figure book formats are selected module

107 books generation modules

108 editor modules

2 books method for makings

201～209 stream audio data according to the books method for making of preferred embodiment of the present invention

301 individual images

302 key picture

40 former video datas

41 video datas

411 audio datas

412 captions data

413 image datas

50 make policy

501 audio analysis algorithms

502 captions analytical algorithms then

503 image analysing computer algorithms

5031 image example data

5032 object data

504 scenes/the shot transition analytical algorithm then

60 computer equipments

601 signal source interfaces

602 internal memories

603 CPU (central processing unit)

604 input medias

605 storage devices

70 figure book formats

80 books

801 word segments

802 illustration parts

Embodiment

Hereinafter with reference to correlative type, the book makign system and the method for preferred embodiment of the present invention is described, wherein identical assembly will be illustrated with identical reference marks.

Please refer to shown in Figure 1, the book makign system 1 of preferred embodiment of the present invention is to be used for producing books 80 that comprise a word segment 801 and an illustration part 802, and comprises a video reception module 101, a decoder module 102, a making policy selection module 103, a literal acquisition module 104, an illustration acquisition module 105, a figure book format selection module 106, a books generation module 107 and an editor module 108.

In the present embodiment, book makign system 1 can be applied in the computer equipment 60, and computer equipment 60 can be existing computer installation, and it comprises a signal source interface 601, an internal memory 602, a CPU (central processing unit) (CPU) 603, an input media 604 and a storage device 605.Wherein, signal source interface 601 is connected with a signal source output unit or signal source record device, for example be CD-ROM drive, FireWire (IEEE 1394 Interface), generic serial port interface arrangements such as (USB), and the signal source output unit for example is a digital camera, and signal source record device for example is VCD, DVD etc.Internal memory 602 can be any or multiple scratch-pad memories that are arranged in the computer installation such as DRAM or EEPROM.603 of CPU (central processing unit) can adopt any existing central processing unit framework, for example, comprise ALU, buffer and controller etc., carrying out various DATA PROCESSING and computing, and the start of each assembly in the control computer equipment 60.Input media 604 can be that mouse, keyboard etc. can be by user's input informations voluntarily, or operates the device of each software module.Storage device 605 can be the data storage device of any or multiple embodied on computer readable such as Winchester disk drive, floppy drive.

Each module in the present embodiment is meant and is stored in the storage device 605 or the software module in the programmed recording medium.CPU (central processing unit) 603 can realize the function of each module via each assembly in the computer equipment 60 after reading each module.Yet be noted that, those skilled in the art also can be made into hardware with disclosed software module in the present embodiment, as special IC (application-specific integrated circuit, ASIC) chip etc., and do not violate spirit of the present invention and category.

Below describe the function of each module in the present embodiment in detail.

In the present embodiment, video reception module 101 receives a former video data 40, the former video data 40 of decoder module 102 decodings is to obtain a video data 41, making policy selection module 103 is to accept a user operation to choose a required making policy 50,104 foundations of literal acquisition module are made policy 50 and obtain word segment 801 in video data 41,105 of illustration acquisition modules capture at least one key picture with as illustration part 802 according to making policy 50 in video data 41, and the figure book format selects module 106 to receive user's selection so that at least one figure book format 70 to be provided, books generation module 107 is applied mechanically figure book format 70, and according to obtained word segment 801 and illustration part 802 generation books 80, at last, editor module 108 is accepted the content of user's operation with editor's books 80 after books 80 produce.

As mentioned above, video reception module 101 cooperates with signal source interface 601, for example, video reception module 101 can obtain the former video data 40 that is stored in the digital camera by FireWire (IEEE 1394 Interface), or obtains the former video data 40 that is recorded among VCD, the DVD by CD-ROM drive.Former video data 40 is by box etc. on various video capture devices or receiving trap such as digital video camera, TV card, the machine, and various video storage devices such as DVD, VCD are stored, transmit, broadcasting (Broadcasting) or the video source that receives, and it can be with various video data forms (as MPEG-1, MPEG-2, MPEG-4, AVI, ASF, MOV etc.) store, transmit, broadcast or receive.

Decoder module 102 can be decoded to change and is reduced to the preceding data of coding or be similar to the preceding data of coding at video format, coded system or the compress mode of the former video data of importing 40, for example, if coded system adopts distortion compress mode (Lossy Compression), then be merely able to obtain the data that is similar to before encoding after the decoding, so that produce a video data 41.In the present embodiment, video data 41 comprises an audio data 411, a captions data 412 and an image data 413.Audio data 411 is the sound of putting of dialling in the video data 41; Captions data 412 is that cooperation image data 413 comes across the captions crossfire (caption stream) on the screen; Image data 413 is all shown individual images of video data 41, and the video data 41 of p.s. is to be dialled continuously by 25 individual images or 29.97 individual images to put institute and constitute usually.

Making policy selection module 103 is to cooperate with input media 604, so that mandatory policy when utilizing input media 604 to select to make books 80 by the user, the making policy 50 that present embodiment provided comprise then 502, one image analysing computer algorithm 503 and one scene/shot transition analytical algorithm then 504 of an audio analysis algorithm 501, a captions analytical algorithm.

From the above, audio analysis algorithm 501 is audio datas 411 of analysis video data 41, and utilizes feature extraction (Features Extraction) and characteristic matching (Features Matching) mode to analyze.The feature of audio data 411 comprises as spectrum signature (Spectral Features), volume (Volume), zero axle intersection rate (Zero Crossing Rate), tone (Pitch) etc.As mentioned above, after extracting spectrum signature (Spectral Features), it is via noise decay (NoiseReduction), segmentation (Segmentation), and utilize fast fourier conversion (Fast FourierTransform) that audio data 411 is gone to frequency field (Frequency), carry out eigenvalue extracting by a group of frequency filters (Filters) then, this stack features value is formed a spectrum signature vector (Spectral Feature Vector).Volume is a kind of feature that measures easily, it can utilize root-mean-square value (RMS, Root Mean Square) represents its eigenwert, analyze the carrying out that to assist segmentation (Segmentation) by volume (Volume) then, that is help the decision of audio data 411 paragraph boundaries (Boundaries) by silence detection (SilenceDetection).Zero axle intersection rate (Zero Crossing Rate) is for calculating the number of times of every section (Clips) sound waveform (Waveform) and zero axle (ZeroAxis) intersection.Tone (Pitch) is the fundamental frequency (Fundamental Frequency) of sound waveform (Waveform).Therefore, the proper vector (Feature Vector) that audio data 411 can utilize above-mentioned audio frequency characteristics and eigenwert thereof to form is analysed and compared with the feature of audio samples (Audiotemplates), so that obtain required audio data 411, and obtain word segment 801 via the speech recognition technology, and obtain in video data 41 with required audio data 411 corresponding image data 413 with as illustration part 802.In the present embodiment, audio analysis algorithm 501 can provide the audio samples classification in advance, as music (Music), voice (Speech), animal sound (Animal Sound), male voice (Male Speech) and female voice (FemaleSpeech) etc., select the audio categories of institute's desire searching for the user, therefore, in the distance range that characteristic matching is convenient to allow, searching has the audio samples classification of the shortest geometric distance (Euclidean Distance) with the proper vector (Feature Vector) of audio data 411, if this immediate audio samples classification is identical with the selected audio categories of user, then this audio data 411 meets search condition, in addition, can utilize the inverse (Inverse) of short geometric distance to represent the confidence level (Confidence) of selected audio data 411, find out corresponding video pictures paragraph (Clips) from the audio data 411 that meets the audio samples classification, and from each camera lens of these video pictures paragraphs, pick out meet get the figure demand image as illustration part 802.In addition,, then understand the captions crossfire in the selected audio data 411 pairing video datas 41, be used as the word segment 801 of books 80 if video data 41 comprises captions crossfire (Caption Stream); If video data 41 does not comprise the captions crossfire, then understand the audio data 411 in the selected audio data 411 and utilize speech analysis (Speech Analysis) to carry out the conversion process of voice and literal (Voice toText), with word segment 801 as books 80.In addition, the computational complexity of audio analysis algorithm 501 is lower than image or vision (Visual) is analyzed, and can be used as the guiding and the auxiliary information of image or vision (Visual) analysis.

In addition, the captions analytical algorithm 502 is captions data 412 in the analysis video data 41 then, and screening has the video pictures of captions.In other words, then understand the captions crossfire with as word segment 801 if video data 41 comprises the captions crossfire, and searching is corresponding with captions and first video pictures of time synchronized as illustration part 802; If video data 41 does not comprise the captions crossfire, but captions are contained in and then utilize the text-recognition technology that captions (Caption) are extracted from video image as word segment 801 in the video image, and carry out image processing at the video image that screening obtains and remove captions (can carry out calculation process) by the data of front and back video image, with the video image of obtaining no captions with as illustration part 802.As mentioned above, the text-recognition technology mainly utilizes optical character identification technique (OCR, Optical Character Recognition) to carry out text-recognition.

Image analysing computer algorithm 503 is the image datas 413 in the analysis video data 41, and is the foundation of analysis and judgement with basic visual signature such as color, texture, shape, action, position.In the present embodiment, when captions are contained in video image, utilize the text-recognition technology that captions are extracted from video image as word segment 801; In addition, a video data 41 and an image example data 5031 are compared, so that look for the big picture of video vision characteristic similarity, or look for the big picture of video vision feature diversity with as illustration part 802, or a video data 41 and an object data 5032 compared, for example seek the video pictures that has people's face in the video data 41 with as illustration part 802 with face detection (Face Detection) technology.In the present embodiment, when the big picture of selecting with image example data 5031 or object data 5032 of visual signature similarity, or the big video pictures of video vision feature diversity, with when meeting the video data 41 of screening picture criterion, same camera lens can be set and only screen a picture with as illustration part 802.

Scene/shot transition analytical algorithm 504 is scene/shot transitions of image data 413 in the analysis video data 41 then, and first qualified picture behind the scene/shot transition of image data 413 in the screening video data 41, with cut-point as the paragraph of the illustration part 802 of books 80 and video data 41.That is be then to understand captions data 412 in the paragraph of video data 41 with word segment 801 as books 80 if video data 41 comprises the captions crossfire; If video data 41 does not comprise the captions crossfire and then understands the audio data 411 in the paragraph of video data 41, and utilizes speech analysis to carry out the conversion process of voice and literal with the word segment 801 as books 80.Generally speaking, video data 41 is a video serial (VideoSequence), and it is made up of many scenes (Scenes) usually, and each scene is made up of a plurality of camera lenses (Shots).In film, its least unit is a camera lens, and film is to be piled up by many camera lenses; In drama, its least unit is a scene, or is called a play, scene is represented the paragraph of each story or subject matter, and each scene has a clear and definite incident generation starting point, also has a clear and definite end point, in such a period of time category, just be called a scene, or be called scene of a play.Usually, a camera lens is made up of a plurality of visual characteristics (as color (Color), texture (Texture), shape (Shape), action (Motion)) picture (Frames) with consistency, and, it changes with the change of the angle (View Angle) of finding a view to photograph according to camera motion direction (Camera Direction), for example, when video camera is taken Same Scene with the different angles of finding a view to photograph, can produce different camera lenses, or, also can produce different camera lenses with the identical angle but take differently when regional of finding a view to photograph.Owing to camera lens can be distinguished by some basic visual characteristics, therefore video data 41 being divided into a plurality of continuous camera lenses reaches quite easily, this technology is mainly by statistical data such as the visual characteristic histogram (Histogram) of analyzing some basic visual characteristics, therefore, when the visual characteristic difference of the visual characteristic of a picture and last picture reaches a certain degree, just can cut apart at this picture and last picture intercropping one, this minute the mirror technology also extensively apply to video editing software.As mentioned above, the camera lens of continuous tool relevance is gathered into the purpose that a scene is the scene change analysis, rigorous says, it must understand the meaning of one's words and the content of video data 41, but the analysis in conjunction with audio frequency and visual characteristic also can reach the rational scene change analysis of certain degree, usually scene change can produce acoustic characteristic simultaneously (as music, voice, noise (Noise), quiet (Silence)) with visual characteristic (as color, action) change of properties, and cutting apart of camera lens only analyzed at visual characteristic, and the scene change analysis must be relied on the analysis of acoustic characteristic and visual characteristic simultaneously for counsel.

Literal acquisition module 104 can be a software module that is stored in storage device 605 with illustration acquisition module 105, and it is next according to making the required word segment 801 and illustration part 802 of policy 50 acquisitions, with content as making books 80 by the computing of CPU (central processing unit) 603.

The figure book format is selected the figure book format 70 that module 106 provided as is painted basis, picture album, e-book, caricature etc., and can cooperate different filter (Filters) as artist's formula filter (ArtisticFilters), sketch filter (Sketch Filters), sideline filter (Edge Filters), apply mechanically obtained illustration part 802, obtaining the image processing effect (Effects) that the user wants, and figure book format 70 is stored in the storage device 605 with various filters.

Books generation module 107 can be a software module that is stored in storage device 605, and the computing by CPU (central processing unit) 603, so that apply mechanically figure book format 70, and utilization is as adjusting image processing (Image processing) functions such as size (Rescaling), image synthetic (Image Composing), making picture frame, handle obtained word segment 801 and illustration part 802, so that the figure book format 70 that the person of being used selects produces books 80 with font, size.

At last, editor module 108 can cooperate with input media 604, so that the content that the user after books 80 produce, utilizes the operation of input media 604 further to edit books 80.

For content of the present invention is more readily understood, below will lift an example, with the flow process of explanation according to the books method for making of preferred embodiment of the present invention.

Please refer to shown in Figure 2, in the books method for making 2 according to preferred embodiment of the present invention, step 201 is to receive former video data 40, for example, the data of noting down in the digital camera can be delivered to signal source interface 601 via transmission line, to provide as the picture and the content of making books 80.

In step 202, decoder module 102 is the form of the former video data 40 of identification and the video data 41 of former video data 40 with the decoding of generation process of decoding, for example, former video data 40 is an Interlaced MPEG-2 form, that is is, a news frame is made up of two news (field), so, in this step, can carry out the decoding of MPEG-2 form earlier, utilize interpolation method (Interpolation) release of an interleave to obtain video data 41 then.

In step 203, literal acquisition module 104 comes analysis video data 41 to obtain word segment 801 and illustration part 802 with illustration acquisition module 105 according to making policy 50, it can be according to audio analysis algorithm 501, captions analytical algorithm then 502, image analysing computer algorithm 503 and scene/shot transition analytical algorithm then 504, each video pictures and content (comprising audio content) at video data 41, analyze and search and screening obtains and meets the word segment 801 and the illustration part 802 of making policy 50, for example, if video data 41 comprises that captions crossfire that the captions crossfire then understands video data 41 is with as word segment 801; If video data 41 does not comprise the captions crossfire and then understands the audio frequency of video data 41, and utilize conversion process that speech analysis carries out voice and literal with as word segment 801, and with captions crossfire or audio frequency corresponding image in the acquisition key picture as illustration part 802, be noted that present embodiment can capture many key picture and be used as illustration part 802.As shown in Figure 3, former video data 40 can obtain video data 41 after decoding, it comprises many individual images 301 (25 or 29.97 of per seconds), and through according to can from this individual image, capturing key picture 302 after the analysis search of making policy 50 with as illustration part 802.

Step 204 is the analyses and comparison that judge whether to finish all the elements in the video data 41, during the analyses and comparison of all the elements, does not repeat step 203 in finishing video data 41; In finishing video data 41, during the analyses and comparison of all the elements, carry out step 205.

Step 205 is to judge that whether books 80 need to apply mechanically figure book format 70, when books 80 need be applied mechanically figure book format 70, carry out steps 206; When books 80 do not need to apply mechanically figure book format 70, carry out step 207.

In step 206, the figure book format selects module 106 to provide the user to select required figure book format 70, figure book format 70 comprises various books models with picture, image, photograph, drawing or drawing, for example, caricature, paint basis, picture album, e-book etc., and the various layout space of a whole page (Layout).

In step 207, books generation module 107 is according to the word segment 801 and illustration part 802 obtained in step 203, and, when carrying out step 206, the figure book format of applying mechanically in the step 206 to be provided 70, and use different filters, as artist's formula filter, the sketch filter, sideline filter etc., handle illustration part 802,, utilize as adjust size again to obtain required image processing effect, image is synthetic, IPFs such as making picture frame obtain meeting the image frame of figure book format 70, then, word segment 801 is cooperated figure book format 70 and font with illustration part 802, size is carried out conversion process, produces books 80.

Step 208 is to judge that whether the user carries out manual editing's books 80, when the user will carry out manual editing's books 80, carry out step 209.

In step 209, the content that the user utilizes editor module 108 to come preview (Preview), revise (Refine), modify (Modify) books 80.For example, the user can add bottom line at the word segment of the important content of books 80, or the literal overstriking etc.; Or the user can insert pattern or the like in addition.

In sum, because the book makign system of preferred embodiment of the present invention and method can analysis video data 41, with audio data 411, captions data 412 and image data 413 at video data 41, integrate technology such as video content analysis, text-recognition, voice recognition, so can utilize video data to produce book file efficiently.

The above only is an illustrative, but not is restricted.Anyly do not break away from spirit of the present invention and category, and, all should be contained among the protection domain of this patent its equivalent modifications of carrying out or change.

Claims

1, a kind of book makign system, it is to be used for producing books, and these books comprise a word segment and an illustration part, and this book makign system comprises:

One video reception module, it receives a former video data;

One decoder module, its this former video data of decoding is to obtain a video data;

One literal acquisition module, it is made policy according to one and obtain this word segment in this video data;

One illustration acquisition module, it is made policy according to this and captures a key picture with as this illustration part in this video data; And

One books generation module, it partly produces these books according to obtained this word segment and this illustration.

2, according to the described book makign system of claim 1, it is characterized in that: also comprise:

One editor module, it is after these books produce, and the operation that receives a user is to edit the content of these books.

3, according to the described book makign system of claim 1, it is characterized in that: also comprise:

One figure book format is selected module, and its selection that receives a user to be providing at least one figure book format, and this books generation module is applied mechanically this figure book format and produced this books.

4, according to the described book makign system of claim 1, it is characterized in that: also comprise:

One makes policy selects module, and its selection of accepting a user is to provide this making policy.

5, according to the described book makign system of claim 1, it is characterized in that: this making policy comprises an audio analysis algorithm, it analyzes the audio data in this video data, this literal acquisition module captures this audio data obtaining this word segment according to this audio analysis algorithm, and this illustration acquisition module acquisition and the corresponding image data of this audio data are with as this illustration part.

6, according to the described book makign system of claim 1, it is characterized in that: this making policy comprises a captions analytical algorithm then, it is a captions data of analyzing in this video data, this literal acquisition module then captures this captions data obtaining this word segment according to this captions analytical algorithm, and this illustration acquisition module acquisition and the corresponding image data of this captions data are with as this illustration part.

7, according to the described book makign system of claim 1, it is characterized in that: this making policy comprises an image analysing computer algorithm, it is according to the image data in this video data of image case study, this illustration acquisition module captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition module from corresponding this video data of this image data obtain this word segment.

8, according to the described book makign system of claim 1, it is characterized in that: this making policy comprises an image analysing computer algorithm, it is that foundation one object is analyzed the image data in this video data, this illustration acquisition module captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition module from corresponding this video data of this image data obtain this word segment.

9, according to the described book makign system of claim 1, it is characterized in that: this making policy comprises an image analysing computer algorithm, it analyzes the image data in this video data, this literal acquisition module captures captions in this image data with as this word segment, and this illustration acquisition module captures this image data with as this illustration part.

10, according to the described book makign system of claim 1, it is characterized in that: this making policy comprises that one scene/the shot transition analytical algorithm then, it is scene/shot transition of analyzing an image data in this video data, this literal acquisition module and this illustration acquisition module with this scene/shot transition analytical algorithm then as this word segment and this illustration partly selection and the foundation of segmentation.

11, a kind of books method for making, it is used for producing books, and these books comprise a word segment and an illustration part, and this books method for making comprises:

One video reception step, it receives a former video data;

One decoding step, its this former video data of decoding is to obtain a video data;

One literal acquisition step, it is made policy according to one and obtain this word segment in this video data;

One illustration acquisition step, it is made policy according to this and captures a key picture with as this illustration part in this video data; And

One books produce step, and it partly produces these books according to obtained this word segment and this illustration.

12, according to the described books method for making of claim 11, it is characterized in that: also comprise:

One edit step, it is after these books produce, and the operation that receives a user is to edit the content of these books.

13, according to the described books method for making of claim 11, it is characterized in that: also comprise:

One figure book format is selected step, and its selection that receives a user to be providing at least one figure book format, and these books produce step and apply mechanically this figure book format and produce this books.

14, according to the described books method for making of claim 11, it is characterized in that: also comprise:

One makes policy selects step, and its selection of accepting a user is to provide this making policy.

15, according to the described books method for making of claim 11, it is characterized in that: this making policy comprises an audio analysis algorithm, it analyzes the audio data in this video data, this literal acquisition step captures this audio data obtaining this word segment according to this audio analysis algorithm, and this illustration acquisition step acquisition and the corresponding image data of this audio data are with as this illustration part.

16, according to the described books method for making of claim 11, it is characterized in that: this making policy comprises a captions analytical algorithm then, it is a captions data of analyzing in this video data, this literal acquisition step then captures this captions data obtaining this word segment according to this captions analytical algorithm, and this illustration acquisition step acquisition and the corresponding image data of this captions data are with as this illustration part.

17, according to the described books method for making of claim 11, it is characterized in that: this making policy is an image analysing computer algorithm, it is according to the image data in this video data of image case study, this illustration acquisition step captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition step from corresponding this video data of this image data obtain this word segment.

18, according to the described books method for making of claim 11, it is characterized in that: this making policy comprises an image analysing computer algorithm, it is that foundation one object is analyzed the image data in this video data, this illustration acquisition step captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition step from corresponding this video data of this image data obtain this word segment.

19, according to the described books method for making of claim 11, it is characterized in that: this making policy comprises an image analysing computer algorithm, it is an image data of analyzing in this video data, this literal acquisition step captures captions in this image data with as this word segment, and this illustration acquisition step captures this image data with as this illustration part.

20, according to the described books method for making of claim 11, it is characterized in that: this making policy comprises that one scene/the shot transition analytical algorithm then, it is scene/shot transition of analyzing an image data in this video data, this literal acquisition step and this illustration acquisition step with this scene/shot transition analytical algorithm then as this word segment and this illustration partly selection and the foundation of segmentation.

21, a kind of recording medium, it is that record is used so that computing machine is reached the program of a books method for making, and this books method for making is used for producing books, and these books comprise a word segment and an illustration part, and this books method for making comprises:

One video reception step, it receives a former video data;

22, according to the described recording medium of claim 21, it is characterized in that: this books method for making also comprises:

23, according to the described recording medium of claim 21, it is characterized in that: this books method for making also comprises:

24, according to the described recording medium of claim 21, it is characterized in that: this books method for making also comprises:

25, according to the described recording medium of claim 21, it is characterized in that: this making policy comprises an audio analysis algorithm, it analyzes the audio data in this video data, this literal acquisition step captures this audio data obtaining this word segment according to this audio analysis algorithm, and this illustration acquisition step acquisition and the corresponding image data of this audio data are with as this illustration part.

26, according to the described recording medium of claim 21, it is characterized in that: this making policy comprises a captions analytical algorithm then, it is a captions data of analyzing in this video data, this literal acquisition step then captures this captions data obtaining this word segment according to this captions analytical algorithm, and this illustration acquisition step acquisition and the corresponding image data of this captions data are with as this illustration part.

27, according to the described recording medium of claim 21, it is characterized in that: this making policy is an image analysing computer algorithm, it is according to the image data in this video data of image case study, this illustration acquisition step captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition step from corresponding this video data of this image data obtain this word segment.

28, according to the described recording medium of claim 21, it is characterized in that: this making policy comprises an image analysing computer algorithm, it is that foundation one object is analyzed the image data in this video data, this illustration acquisition step captures this image data obtaining this illustration part according to this image analysing computer algorithm, and this literal acquisition step from corresponding this video data of this image data obtain this word segment.

29, according to the described recording medium of claim 21, it is characterized in that: this making policy comprises an image analysing computer algorithm, it is an image data of analyzing in this video data, this literal acquisition step captures captions in this image data with as this word segment, and this illustration acquisition step captures this image data with as this illustration part.

30, according to the described recording medium of claim 21, it is characterized in that: this making policy comprises that one scene/the shot transition analytical algorithm then, it is scene/shot transition of analyzing an image data in this video data, this literal acquisition step and this illustration acquisition step with this scene/shot transition analytical algorithm then as this word segment and this illustration partly selection and the foundation of segmentation.