CN110300274A

CN110300274A - Method for recording, device and the storage medium of video file

Info

Publication number: CN110300274A
Application number: CN201810235113.8A
Authority: CN
Inventors: 冯驰伟; 赵亮; 肖鹏; 王文涛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-03-21
Filing date: 2018-03-21
Publication date: 2019-10-01
Anticipated expiration: 2038-03-21
Also published as: CN110300274B

Abstract

The invention discloses a kind of method for recording of video file, device and storage mediums, belong to Internet technical field.The described method includes: display includes the video record interface of subtitle addition option during video record；When receiving subtitle addition instruction, according to audio data collected during video record, the text data that the audio data is converted to is obtained；The display text data on video record interface；When receiving video record END instruction, the video file including text data is generated.The present invention is converted to text data in video file recording process, by collected audio data, and the text data being converted to is shown at video record interface.The process makes a subtitle file without user in advance, saves vast resources and manufacturing process is simpler.

Description

Method for recording, device and the storage medium of video file

Technical field

The present invention relates to Internet technical field, in particular to a kind of method for recording of video file, device and storage are situated between Matter.

Background technique

With the development of internet technology, various social applications are widely used in the life of user, it has also become between user The main tool of communication exchange.In order to meet the use demand of user, social application provides video record function, is based on video Recording function, many users can be with recorded video, and adds text in the video of recording, to promote the entertaining interacted between user Property.

Currently, the method for recording of video file are as follows: the video record function based on social application records the first video text Part；Obtain the subtitle file of user's production；Subtitle file is merged into the first video file, the second video file is obtained.

However, the relevant technologies have been recorded due to needing user to make a subtitle file in advance, and in the first video file Cheng Hou merges subtitle file with the first video file recorded, cause resource consumption larger and manufacturing process more It is cumbersome.

Summary of the invention

In order to solve problems in the prior art, the embodiment of the invention provides a kind of method for recording of video file, device And storage medium.The technical solution is as follows:

On the one hand, a kind of method for recording of video file is provided, which comprises

During video record, show that video record interface, the video record interface include subtitle addition option；

When receiving subtitle addition instruction, according to audio data collected during video record, described in acquisition The text data that audio data is converted to；

The text data is shown on the video record interface；

When receiving video record END instruction, the video file including text data is generated.

On the other hand, a kind of record device of video file is provided, described device includes:

Display module, for showing video record interface, the video record interface includes word during video record Curtain addition option；

Obtain module, for when receive subtitle addition instruction when, according to audio collected during video record Data obtain the text data that the audio data is converted to；

The display module, for showing the text data on the video record interface；

Generation module, it is raw for when receiving video record END instruction, generating the video file including text data At the video file including text data.

On the other hand, a kind of terminal is provided, the terminal includes processor and memory, is stored in the memory At least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, institute It states code set or described instruction collection is loaded by the processor and executed to realize the method for recording such as video file.

On the other hand, a kind of computer readable storage medium is provided, at least one finger is stored in the storage medium Enable, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or Described instruction collection is loaded by processor and is executed the method for recording to realize video file.

Technical solution provided in an embodiment of the present invention has the benefit that

In video file recording process, collected audio data is converted into text data, and will be converted to Text data is shown in video record interface.The process makes a subtitle file without user in advance, saves vast resources And manufacturing process is simpler.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is implementation environment involved in a kind of method for recording of video file provided in an embodiment of the present invention；

Fig. 2 is a kind of flow chart of the method for recording of video file provided in an embodiment of the present invention；

Fig. 3 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Fig. 4 is the schematic diagram of a kind of audio data collecting provided in an embodiment of the present invention and upload procedure；

Fig. 5 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Fig. 6 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Fig. 7 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Fig. 8 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Fig. 9 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Figure 10 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Figure 11 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Figure 12 is a kind of schematic diagram at video record interface provided in an embodiment of the present invention；

Figure 13 is the schematic diagram of a kind of text data and video file alignment procedure provided in an embodiment of the present invention；

Figure 14 is a kind of schematic diagram at video preview interface provided in an embodiment of the present invention；

Figure 15 is a kind of structural schematic diagram of the record device of video file provided in an embodiment of the present invention；

Figure 16 is the structural schematic diagram of terminal provided in an embodiment of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

Fig. 1 shows implementation environment involved in the method for recording of video file provided in an embodiment of the present invention, referring to figure 1, which includes terminal 101, server 102 and terminal 103.

Wherein, terminal 101 can be smart phone, tablet computer, laptop etc., and the embodiment of the present invention is not to terminal 101 product type makees specific limit.In order to meet the use demand of user, terminal 101 is equipped at least one social activity and answers With at least one social application can call the camera of terminal 101, acquire the image data of user, can also call terminal 101 Microphone, acquires the audio data of user, to realize video record function.

Server 102 is a kind of social application server, and the social activity that server 102 can be installed based on terminal 101 is answered With, for terminal 101 provide communication service and other service, for example, audio data to be converted to the service etc. of text data.

The product type of terminal 103 is identical as the product type of terminal 101, and the embodiment of the present invention is equally without specific It limits.Terminal 103 can be equipped with social application identical with terminal 101, and based on the social application installed, receive service The video file that the terminal 101 of device forwarding is recorded；Terminal 103 can certainly install the social application different from terminal 101, or Person does not install any social application, and in that case, terminal 103 can pass through bluetooth, NFC (Near Field Communication, the short distance wireless communication technology) or other means receive terminal 101 record video file.

The embodiment of the invention provides a kind of method for recording of video file, referring to fig. 2, side provided in an embodiment of the present invention Method process includes:

201, during video record, terminal shows video record interface.

When user has video record demand, user can choose any social activity with video record function in terminal to answer With terminal chooses operation by detection user's, the social application is run, to pass through the video record option to social application Choose operation, show video record interface.

Wherein, video record interface is used for the current video image of real-time display user, view during video record Frequency recording interface includes various options relevant to video record, for example, record stop option, delivery option, save options with And video pictures optimization option etc., such as the options such as contrast, brightness, further include subtitle addition option, is added based on the subtitle Option, terminal can add text data during video record.It is also aobvious on video record interface other than various function choosing-items Be shown with the correlation time with video record, such as recorded duration, remaining recording duration etc..

202, when receiving subtitle type selection command, terminal shows at least one subtitle type option.

During video record, the operation of choosing of option is added to the subtitle by detecting, terminal receives subtitle class Type selection instruction, under the triggering of the subtitle type selection command, terminal shows at least one subtitle on video record interface Type option, every kind of subtitle type option correspond to a kind of Subtitle Demonstration form.At least one subtitle type type selecting is respectively first Type option, Second Type option and third type option etc., wherein the display format of first kind option can be barrage shape Formula, the display format of Second Type option can add the form of text for barrage, and the display format of third type option can be The form moved around from center.

Fig. 3 is a kind of video record interface, referring to Fig. 3, subtitle addition option is shown on the video record interface, works as inspection It measuring and when choosing operation of option is added to subtitle, terminal shows three types option on video record interface, respectively the One type option, Second Type option and third type option.

203, when receiving subtitle addition instruction, terminal according to audio data collected during video record, Obtain the text data that audio data is converted to.

During video record, when detecting the operation of choosing to any subtitle type option, terminal is based on microphone Audio signal is acquired, and obtains audio data from audio signal, and then according to audio collected during audio recording Data obtain the text data that audio data is converted to.When detecting the operation of choosing to any subtitle type option Afterwards, before audio data is converted to text data, terminal will also show subtitle addition prompt letter on video record interface Breath, to prompt user that can add subtitle in the video of recording.The content that the subtitle adds prompt information can be " recorded video When speak can add subtitle " etc..

Terminal obtains the textual data that audio data is converted to according to audio data collected during video record According to process it is as follows:

2031, during video record, terminal carries out resampling according to audio signal of the default sample rate to acquisition, obtains To an at least frame audio data.

Wherein, presetting sample rate is the audio sample rate that server can be supported, and can negotiate to determine by terminal and server, It is advisable in the embodiment of the present invention with 16000.

Terminal carries out resampling according to audio signal of the default sample rate to acquisition, can convert collected audio signal It is an at least frame audio data, the duration of every frame audio data can be determined according to default sample rate.For example, default sample rate is 16000, then duration=1000*1/16000=0.625 milliseconds of every frame audio data.

2032, when at least a frame audio data meets preset condition, terminal converts an at least frame audio data, Obtain text data.

When at least a frame audio data meets preset condition, terminal converts an at least frame audio data, obtains When text data, step 20321~20323 can be used:

20321, terminal calculates the volume value of every frame audio data.

By taking any frame audio data as an example, terminal obtains the range value for each sampled point that the frame audio data includes, and Square for calculating each sample amplitude value then calculates square of the range value for all sampled points that the frame audio data includes With, and then the logarithm of the quadratic sum of range value is calculated, obtain the volume value of the frame audio data.

20322, when the volume value of any frame audio data is greater than specified threshold, terminal storage audio data.

Wherein, specified threshold is determined according to voice volume value, which is generally 45.Terminal is by by any frame sound The volume value of frequency evidence is compared with specified threshold, can be screened to an at least frame audio data, to screen out at least The audio data of non-voice in one frame audio data.

When the volume value of audio data be greater than specified threshold when, terminal to volume value be greater than specified threshold audio data into Row pretreatment, to obtain one section of clear effective voice data.The preprocessing process are as follows: the audio data drops in terminal It makes an uproar processing, then the range value of each sampled point of the audio data after noise reduction process is doubled, is obtained pretreated Audio data.

20323, when the total duration of the audio data stored reaches the first preset duration, terminal is first pre- to total duration If the audio data of duration is converted, text data is obtained.

Wherein, the first preset duration can be 100 milliseconds, 200 milliseconds etc..Terminal is the first preset duration to total duration Audio data is converted, and when obtaining text data, can be used two ways, and a kind of mode is that terminal is to total duration in local The audio data of first preset duration is converted, and text data is obtained；Total duration is by another way, terminal by network The audio data of first preset duration is sent to server, is converted by server.

In view of during entire video record, terminal needs to acquire a large amount of audio data, for the ease of to these Audio data distinguishes, and terminal can be the audio data that total duration is the first preset duration as unit of the first preset duration Data directory is set, which can be configured according to the acquisition order of audio data, for example, can reach for first The audio data setting data directory of one preset duration is 1, and number is arranged in the audio data for reaching the first preset duration for second It is 2 according to index, and so on.The data directory is mainly used for when text data merges with video file, by text data with Voice in video is aligned.When terminal needs the audio data that total duration is the first preset duration being sent to server When, the corresponding data directory of the audio data of first preset duration can also be sent to server together, and server is pre- by first If the audio data of duration is converted to text data, and using the corresponding data directory of the audio data of the first preset duration as text The data directory of notebook data, and then text data and its data directory are sent to terminal.

In another embodiment of the present invention, if the volume value of any frame audio data is less than specified threshold, terminal The volume value for counting audio data is less than the duration of specified threshold, when the volume value of audio data is in the second preset duration Respectively less than specified threshold, terminal to server sends heartbeat data, with the heartbeat between maintenance and server, avoids and server Between communication disruption.Wherein, the second preset duration can be negotiated to determine by terminal and server, which is 1 second, 2 Second, 3 seconds etc., it is 3 seconds that the embodiment of the present invention, which chooses the second preset duration in application,.

When getting text data, corresponding relationship between terminal storage data directory and text data, thus rear In continuous operation, corresponding text data can be found according to the data directory.

In view of the text data that same audio data is converted under different context is different, for example, server connects The audio data received is " ni ", and according to the pronunciation of audio data, it is " mud " that server, which gets corresponding text data, and will The text data got is sent to terminal；After the first preset duration, it is " haoma ", knot that server, which receives audio data, Front and back context is closed, server judges that user's actually word is " how do you do " rather than " mud is OK ", at this moment with regard to needs pair The text data for being sent to terminal is corrected.

Specifically, the correction course of text data are as follows: text data and its data directory are sent to terminal by server, when When receiving text data and its data directory, whether terminal inquiry is locally stored with the data directory, if be locally stored The data directory, by according to the text data received, the text data that stored corresponding to the data directory carries out more terminal Newly；If local not stored data directory, terminal directly store the text data and its data directory received.

Fig. 4 shows audio data collecting provided in an embodiment of the present invention and upload procedure, referring to fig. 4, in video record Terminal acquires the voice signal of user based on microphone in real time in the process, and is believed using voice of 16000 sample rate to acquisition Number carry out resampling, obtain an at least frame audio data.Then, terminal calculates the volume value of every frame audio data, and judges sound Whether magnitude is less than 45, if the volume value of audio data is respectively less than 45 in 3 seconds, sends heartbeat data to server；Such as The volume value of fruit any frame audio data is greater than 45, then carries out Nsx noise reduction process to the frame audio data, and will be after noise reduction process The volume value of audio data enhance 2 times, and then cache the frame audio data.Terminal detects the data of cached audio data Whether length (i.e. total duration) is more than or equal to 100 milliseconds, if the data length of the audio data cached is more than or equal to 100 millis Second, then the audio data cached is sent to server；If the data length of the audio data cached is less than 100 millis Second, then continue to store audio data.

204, terminal is according to the corresponding display format of selected word curtain type option, the display text number on video record interface According to.

In embodiments of the present invention, every kind of subtitle type option corresponds to a kind of display format, based on accessed text Data, terminal will be according to the corresponding display formats of selected word curtain-like type, the display text data on video record interface.

For the difference of selected word curtain type option, display text data on video record interface of the embodiment of the present invention Form is also different.Specifically include following several situations:

The first situation, subtitle type option include first kind option.

First kind option be barrage form option, using the first kind option on video record interface display text Before data, need to be arranged as follows:

1, at least one trajectory is set, and every trajectory is used for the corresponding text of display text data；

2, be arranged text mode of entrance, the text mode of entrance include enter from the left side screen from the right exit screen, from The right enters screen and from the left side exits screen, screen is entered from top exits screen from below, enters screen from below from top Screen is exited, screen is entered from upper left side exits screen etc. from lower right, user can be arranged according to the hobby of oneself；

3, character motion speed is set, and the movement speed of the movetext can be random movement on same trajectory Speed be it is identical, the movement speed of text is different on different trajectories；

4, the color of text is set, and the color of the text on same trajectory can be the same or different, on different trajectories The color of text can be the same or different；

5, need to be arranged the size of text, the size of the text be also it is random, the size of the text on same trajectory can It can also be different with identical, the text size on different trajectories can be the same or different.

Based on above-mentioned set content, terminal obtains text data, and obtains the display parameters of text data, including font, Color, size, movement speed, display position etc., and then according to the display parameters of text data, the text in text data is drawn It makes on video record interface, to show the text with different barrage effects on video record interface.

In view of user is in recorded video, one one as unit of sentence is recorded, and terminal also can in display By lteral data as unit of sentence on trajectory one one show that therefore, it is necessary to be arranged at least one sentence Display mechanism.For at least one sentence at least one trajectory display mechanism, the following can be followed:

First, avoid the sentence on a trajectory excessively crowded as far as possible, remaining barrage occurs idle；

Second, when all trajectories are occupied, may be selected to use the longest trajectory of duration, and will be former on the trajectory Some sentences replace；

If third does not receive always new sentence, circulating rolling shows old sentence.

Trajectory quantity is set as 4, the sentence of display is two, two sentences video record interface display format, It can be found in Fig. 5, Fig. 6 and Fig. 7.

Referring to the left figure in Fig. 5, terminal is in 4 Ballistic display first statements, when receiving Article 2 sentence, referring to Right figure in Fig. 5, when receiving Article 2 sentence, terminal replaces original on 4 trajectories first using Article 2 sentence Sentence, and Article 2 sentence is shown on 4 trajectories.

Referring to left figure in Fig. 6, terminal uses 1 Ballistic display first statement, using 3 Ballistic display Article 2 languages Sentence, referring to the right figure in Fig. 6, when receiving Article 3 sentence, terminal is using the Ballistic display third language for showing the first sentence Sentence, and 2 Ballistic display Article 3 sentences, at this time 3 Ballistic display thirds are selected from 3 trajectories of display Article 2 sentence Sentence, a Ballistic display Article 2 sentence.

Referring to the left figure in Fig. 7, terminal works as reception referring to the right figure in Fig. 7 using 3 Ballistic display first statements When to Article 2 sentence, terminal has occupied 1 bullet of selection in trajectory from 3 using remaining 1 Ballistic display Article 2 sentence Road shows the second sentence, at this time 2 Ballistic displays, one sentence, two Ballistic display Article 2 sentences.

Above-mentioned Fig. 5, Fig. 6, Fig. 7 are the schematic diagram for showing form, for actual displayed form, reference can be made to reality shown in Fig. 8 Border surface chart.

Second situation, subtitle type option include Second Type option.

Second Type option is the barrage form option of text and designated pictures.Wherein, designated pictures can for user from The picture downloaded on network perhaps picture provided in picture database or the expression etc. carried for terminal.Text and The position of designated pictures can be adjacent, can also be located on any position of screen.The size of text, the color of text, The movement speed of text and display mechanism etc., identical as the above-mentioned setting to barrage form, details are not described herein again.

It before display text data, needs first to be arranged text on video record interface using Second Type option and specifies The move mode of picture.Specifically, text and the move mode of designated pictures can be with texts and designated pictures always with identical Movement speed is moved together；Or text and designated pictures are moved always with different movement speeds；Or it is directed to In the position of text and designated pictures, adjacent and text is located at the case where designated pictures rear, and text and designated pictures are first with identical Speed moved, designated pictures slow down movement speed when being moved to distance to a declared goal, since text is still with identical speed It is moved, to generate the effect of designated pictures hidden text.Wherein, distance to a declared goal can be set by user according to the interest of itself It sets, can also be arranged by research staff.In order to increase interest, when detecting that designated pictures are moved to distance to a declared goal, figure is specified Piece can use animation mode hidden text while slowing down movement speed.If it is another to be moved to screen in designated pictures Before side, text is hidden completely, then designated pictures will disappear together with text.

Based on above-mentioned set content, terminal obtains text data and designated pictures, and obtains the display parameters of text data, Including font, size, movement speed, color, display position etc., while the display parameters of designated pictures are obtained, including picture position It sets, movement speed etc., and then according to the display parameters of text data and the display parameters of designated pictures, by the text in text data Word and designated pictures are plotted on video record interface, to showed on video record interface with different barrage effects Text and picture.

Referring to Fig. 9, designated pictures are set as beans people's picture, each text is shown in the form of red bean, every A text is as in the moving process of beans people's picture, when the moving distance of beans people's picture is the half of screen length, beans people schemes The movement speed of piece slows down.Since the movement speed of text is constant, thus text is gradually hidden by beans people's picture.To make the process More vivid, the beans people in beans people's picture will constantly change, and by mouth opening and closing, realize that " beans people " eats " beans " effect.Work as text It is eaten up completely, beans people picture also disappears therewith.

Above-mentioned Fig. 9 is the schematic diagram for showing form, for actual displayed form, reference can be made to practical interface shown in Fig. 10 Figure.

The third situation, subtitle type option include third type option.

Third type option is the form choosing for controlling the text in lteral data and being moved around centered on predeterminated position , it before display text data, needs that predeterminated position and movement is first arranged on video record interface using third type option Mode.

Wherein, predeterminated position can be located at screen on any position can if detecting the facial image of user on the screen It, can be with any position if the facial image of user is not detected on the screen using the mouth position of user as predeterminated position For predeterminated position.Move mode can be to be mobile to two sides centered on predeterminated position, can also be for centered on predeterminated position It is mobile etc. to multiple directions.To keep effect more significant, each Text segmentation in every a word can be opened in display, and Each word is shown using a kind of color at random, for each word in a word, can be shown using identical color, It can also be shown using different colors.In the process that control lteral data is moved around centered on predeterminated position In, each word is with the increase of moving distance and the final disappearance that becomes larger, the process can generally continue 400 milliseconds.

Based on above-mentioned set content, terminal obtains text data, and obtains the display parameters of text data, including font, Size, movement speed, color, display position, move mode etc., and then according to the display parameters of text data, by text data In Word-Drawing to video record interface on, to show the text with Different Dynamic effect on video record interface Word.

Referring to Figure 11, predeterminated position is set as the position where user's mouth, while text is from small become larger, terminal is examined Survey user's mouth position, and control text constantly from most bar to the two sides corners of the mouth it is mobile.

Above-mentioned Figure 11 is the schematic diagram for showing form, for actual displayed form, reference can be made to practical interface shown in Figure 12 Figure.

205, when receiving video record END instruction, terminal generates the video file including text data.

During video record, terminal acquires the voice signal of user in real time, and using the above method in video record Display text data on interface.Since voice signal is converted to written form and needs certain time length from collecting, thus recording Text data shown on video record interface and the voice of user are that have certain time difference in the process, lead to video in this way It shown text data and currently collected voice signal and is mismatched on recording interface.For this purpose, when receiving video record When END instruction processed, terminal generates the video including text data by carrying out registration process to lteral data and video image File.

Terminal generation includes the steps that the video file of text data is as follows:

2051, terminal obtains the recording time of the corresponding audio data of text data.

Terminal obtains the recording time of the corresponding video data of text data, including but not limited to the following two kinds mode:

First way, text data have data directory, and terminal obtains the recording of the corresponding audio data of text data It, can be by the product of the data directory of text data and the first preset duration, as the corresponding audio data of text data when the time Recording time.For example, the data directory of text data is 2, the first preset duration is 100 milliseconds, then terminal is by by this article The data directory of notebook data and the first preset duration are multiplied, and the recording time for obtaining the corresponding audio data of text data is 200 Millisecond.

It should be noted that recording time described in the step is the relative time that opposite subtitle adds the moment, in determination The recording time of the corresponding audio data of text data needs the time plus the subtitle addition moment.

When the second way, terminal obtain the recording time of the corresponding audio data of text data, text data can be obtained Receiving time, can recording by the difference between the receiving time of text data and third preset duration, as text data Time.

Wherein, third preset duration is sent to server from terminal side for audio data and returns to the text after conversion to terminal The duration of data, the third preset duration determine that the third preset duration can according to the handling duration and network state of server Think 500 milliseconds, 600 milliseconds, 800 milliseconds etc..For example, the receiving time of text data is 10:01:00, third preset duration It is 800 milliseconds, then the recording time that this article notebook data can be obtained is 10:00:200.

2052, terminal is according to the recording time of audio data and the recording time of every frame video image, to text data and Video image merges, and obtains video file.

By in the merging process of text data and video image, terminal obtains a frame video according to recording time sequence Image, and according to the recording time of the recording time of frame video image and the text data not merged, by recording time earlier than this The recording time of the frame video image and text data not merged is merged with the frame video image.Due in video record mistake Cheng Zhong, accessed lteral data have certain display effect, and therefore, it is necessary to selected word curtain-like type addition option is corresponding Display data, text data and video image be packaged so that terminal or other terminals are playing video text When part, the text synchronous with user speech can not only be shown, and can be according to the selected subtitle type option pair of user The display effect answered is shown, so that interactive process has more interest.

In the merging process of text data and video image, if video image has not been obtained, illustrate that text is completed Data merge with video image, if can not get recording time of the recording time earlier than the frame video image and not merge Text data, illustrate that user does not speak when recording the frame video image or user is turned off subtitle addition function, this When the frame video image can be added in video file.

Figure 13 shows the schematic diagram of text data Yu video file alignment procedure, referring to Figure 13, when receiving server When the text data of transmission, terminal judges locally whether be stored with this article notebook data according to the data directory of text data, if The data directory of this article notebook data is locally stored, then according to the text data received, to the local text data stored It is updated；If the data directory of local not stored this article notebook data, according to the data directory of this article notebook data and first It 100 milliseconds of preset duration, determines the recording time of the corresponding audio data of text data, is then saved in local.Later, eventually Text data is plotted to video record interface according to subtitle type option by end, when receiving video record END instruction, eventually End generates provisional video file A, and extracts a frame video image from video file A sequentially in time, if cannot succeed Extract video image, then terminal can send video file A, if it is possible to successfully extract video image, then from In the local text data cached, the text data that recording time is not later than the frame video image is obtained, and will be acquired Text data is merged with the frame video image, obtains new video file B；The frame is not later than if there is no recording time The frame video image is added in video file B by the text data of video image, terminal.

When generate include text data video file after, terminal can by directly by the video file issue other ends End, the video file that can also be recorded before transmitting by preview function preview.It is carried out in advance to the video file recorded When looking at, terminal can add special efficacy element in the video file recorded.The process of specific addition special efficacy element are as follows: when receiving When to the instruction for previewing of video file, terminal shows preview interface, wherein preview interface includes at least one special efficacy element addition Option, every kind of special efficacy element addition option correspond to a kind of special efficacy element.At least one special efficacy element includes scribble, mosaic, table At least one such as feelings.Operation is chosen to any special efficacy element addition option when detecting, terminal receives special efficacy element and adds When adding instruction, the corresponding special efficacy element of selected special efficacy element addition option is added in video file by terminal, and in preview circle The video image of addition special efficacy element is shown on face.Figure 14 is the schematic diagram of the preview interface of terminal, referring to Figure 14, preview circle Three kinds of scribble, mosaic and expression special efficacy elements are shown on the video image that face is played.It is added when receiving special efficacy element When instruction, the video file for adding special efficacy element is sent to other terminals by terminal.Other terminals receive the video file, in turn By playing the video file including text data and special efficacy element, interacted with user is recorded.

Method provided in an embodiment of the present invention is converted to collected audio data in video file recording process Text data, and the text data being converted to is shown at video record interface.The process makes one without user in advance Subtitle file, saves vast resources and manufacturing process is simpler.Subtitle can be added in video file recording process, and can will be literary Notebook data is aligned with video image, so that the recording process real-time of video is stronger.In addition, can also be according to selected by user Display format render in riotous profusion text, enrich video content, increase the interest of video.

Referring to Figure 15, the embodiment of the invention provides a kind of record device of video file, which includes:

Display module 1501, for showing video record interface, which includes during video record Subtitle adds option；

Obtain module 1502, for when receive subtitle addition instruction when, according to collected during video record Audio data obtains the text data that audio data is converted to；

Display module 1501, for the display text data on video record interface；

Generation module 1503, for when receiving video record END instruction, generating the video text including text data Part.

In another embodiment of the present invention, generation module 1503, for obtaining the corresponding audio data of text data Recording time；According to the recording time of the recording time of audio data and every frame video image, to text data and video figure As merging, video file is obtained.

In another embodiment of the present invention, text data has data directory, obtains module 1502, is used for text The product of the data directory of data and the first preset duration, the recording time as the corresponding audio data of text data.

In another embodiment of the present invention, module 1502 is obtained, for obtaining the receiving time of text data；It will be literary Difference between the receiving time and third preset duration of notebook data, as the recording time of text data, third preset duration It is determined according to the sending time of audio data and the receiving time of text data.

In another embodiment of the present invention, generation module 1503 are used for for any frame video image, according to video The recording time of the recording time of image and the text data not merged, merges video image and text data.

In another embodiment of the present invention, module 1502 is obtained, for being adopted during video record according to default Sample rate carries out resampling to the audio signal of acquisition, obtains an at least frame audio data；When an at least frame audio data meets in advance If when condition, being converted to an at least frame audio data, obtaining text data.

In another embodiment of the present invention, module 1502 is obtained, for obtaining the volume value of every frame audio data；When The volume value of any frame audio data is greater than specified threshold, stores audio data；When the total duration of the audio data stored reaches To the first preset duration, the audio data that total duration is the first preset duration is sent to server.

In another embodiment of the present invention, the device further include:

Sending module is respectively less than specified threshold for the volume value when audio data, to service in the second preset duration Device sends heartbeat data, and heartbeat data is used to maintain the heartbeat between server, when the second preset duration is default greater than first It is long.

In another embodiment of the present invention, text data has data directory, the device further include:

Data directory has been locally stored for working as in update module, according to text data, textual data corresponding to data directory According to being updated；

Memory module, for having data directory, storing data index and text data when locally not stored.

In another embodiment of the present invention, display module 1501, for when receiving subtitle type selection command, Show that at least one subtitle type option, every kind of subtitle type option correspond to a kind of Subtitle Demonstration form；When receiving textual data When according to acquisition instructions, the step of obtaining text data is executed.

In another embodiment of the present invention, subtitle type option includes first kind option, and display module 1501 is used In when selected word curtain type option be the first kind option when, using barrage form on video record interface display text number According to；Alternatively,

Subtitle type option includes Second Type option, and Second Type option includes designated pictures, and display module 1501 is used In when selected subtitle option be Second Type option when, using barrage form on video record interface display text data with The mobile moving process of designated pictures；Alternatively,

Subtitle type option includes third type option, display module 1501, for being third class when selected subtitle option When type option, movement that the text on video record interface in display text data is moved around centered on predeterminated position Process.

In another embodiment of the present invention, display module 1501 refer to the preview of video file for working as to receive When enabling, show that preview interface, preview interface include that at least one special efficacy element adds option, every kind of special efficacy element adds option pair Answer a kind of special efficacy element；When receiving the addition instruction of special efficacy element, the corresponding special efficacy element of selected element addition option is added It is added in video file, and shows the video image of addition special efficacy element in preview interface.

To sum up, device provided in an embodiment of the present invention turns collected audio data in video file recording process It is changed to text data, and the text data being converted to is shown at video record interface.The process makes in advance without user One subtitle file, saves vast resources and manufacturing process is simpler.Subtitle can be added in video file recording process, and can Text data is aligned with video image, so that the recording process real-time of video is stronger.In addition, can also be according to user institute The display format of selection renders in riotous profusion text, enriches video content, increases the interest of video.

Figure 16 shows the structural block diagram of the terminal 1600 of an illustrative embodiment of the invention offer.The terminal 1600 can To be: smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) player, laptop or desktop computer.Terminal 1600 is also Other titles such as user equipment, portable terminal, laptop terminal, terminal console may be referred to as.

In general, terminal 1600 includes: processor 1601 and memory 1602.

Processor 1601 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 1601 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 1601 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit)；Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 1601 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 1601 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.

Memory 1602 may include one or more computer readable storage mediums, which can To be non-transient.Memory 1602 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 1602 can Storage medium is read for storing at least one instruction, at least one instruction performed by processor 1601 for realizing this Shen Please in embodiment of the method provide video file method for recording.

In some embodiments, terminal 1600 is also optional includes: peripheral device interface 1603 and at least one periphery are set It is standby.It can be connected by bus or signal wire between processor 1601, memory 1602 and peripheral device interface 1603.It is each outer Peripheral equipment can be connected by bus, signal wire or circuit board with peripheral device interface 1603.Specifically, peripheral equipment includes: In radio circuit 1604, touch display screen 1605, camera 1606, voicefrequency circuit 1607, positioning component 1608 and power supply 1609 At least one.

Peripheral device interface 1603 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 1601 and memory 1602.In some embodiments, processor 1601, memory 1602 and periphery Equipment interface 1603 is integrated on same chip or circuit board；In some other embodiments, processor 1601, memory 1602 and peripheral device interface 1603 in any one or two can be realized on individual chip or circuit board, this implementation Example is not limited this.

Radio circuit 1604 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal. Radio circuit 1604 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 1604 is by telecommunications Number being converted to electromagnetic signal is sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 1604 include: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, volume solution Code chipset, user identity module card etc..Radio circuit 1604 can by least one wireless communication protocol come with it is other Terminal is communicated.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio frequency electrical Road 1604 can also include NFC (Near Field Communication, wireless near field communication) related circuit, the application This is not limited.

Display screen 1605 is for showing UI (User Interface, user interface).The UI may include figure, text, Icon, video and its their any combination.When display screen 1605 is touch display screen, display screen 1605 also there is acquisition to exist The ability of the touch signal on the surface or surface of display screen 1605.The touch signal can be used as control signal and be input to place Reason device 1601 is handled.At this point, display screen 1605 can be also used for providing virtual push button and/or dummy keyboard, it is also referred to as soft to press Button and/or soft keyboard.In some embodiments, display screen 1605 can be one, and the front panel of terminal 1600 is arranged；Another In a little embodiments, display screen 1605 can be at least two, be separately positioned on the different surfaces of terminal 1600 or in foldover design； In still other embodiments, display screen 1605 can be flexible display screen, is arranged on the curved surface of terminal 1600 or folds On face.Even, display screen 1605 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 1605 can be with Using LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) etc. materials preparation.

CCD camera assembly 1606 is for acquiring image or video.Optionally, CCD camera assembly 1606 includes front camera And rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.? In some embodiments, rear camera at least two is that main camera, depth of field camera, wide-angle camera, focal length are taken the photograph respectively As any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide Pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are realized in camera fusion in angle Shooting function.In some embodiments, CCD camera assembly 1606 can also include flash lamp.Flash lamp can be monochromatic temperature flash of light Lamp is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for Light compensation under different-colour.

Voicefrequency circuit 1607 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and It converts sound waves into electric signal and is input to processor 1601 and handled, or be input to radio circuit 1604 to realize that voice is logical Letter.For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 1600 to be multiple. Microphone can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 1601 or radio frequency will to be come from The electric signal of circuit 1604 is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramics loudspeaking Device.When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, can also be incited somebody to action Electric signal is converted to the sound wave that the mankind do not hear to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 1607 may be used also To include earphone jack.

Positioning component 1608 is used for the current geographic position of positioning terminal 1600, to realize navigation or LBS (Location Based Service, location based service).Positioning component 1608 can be the GPS (Global based on the U.S. Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union The positioning component of Galileo system.

Power supply 1609 is used to be powered for the various components in terminal 1600.Power supply 1609 can be alternating current, direct current Electricity, disposable battery or rechargeable battery.When power supply 1609 includes rechargeable battery, which can support wired Charging or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.

In some embodiments, terminal 1600 further includes having one or more sensors 1610.One or more sensing Device 1610 includes but is not limited to: acceleration transducer 1611, gyro sensor 1612, pressure sensor 1613, fingerprint sensing Device 1614, optical sensor 1615 and proximity sensor 1616.

Acceleration transducer 1611 can detecte the acceleration in three reference axis of the coordinate system established with terminal 1600 Size.For example, acceleration transducer 1611 can be used for detecting component of the acceleration of gravity in three reference axis.Processor The 1601 acceleration of gravity signals that can be acquired according to acceleration transducer 1611, control touch display screen 1605 with transverse views Or longitudinal view carries out the display of user interface.Acceleration transducer 1611 can be also used for game or the exercise data of user Acquisition.

Gyro sensor 1612 can detecte body direction and the rotational angle of terminal 1600, gyro sensor 1612 Acquisition user can be cooperateed with to act the 3D of terminal 1600 with acceleration transducer 1611.Processor 1601 is according to gyro sensors The data that device 1612 acquires, following function may be implemented: action induction (for example changing UI according to the tilt operation of user) is clapped Image stabilization, game control and inertial navigation when taking the photograph.

The lower layer of side frame and/or touch display screen 1605 in terminal 1600 can be set in pressure sensor 1613.When When the side frame of terminal 1600 is arranged in pressure sensor 1613, user can detecte to the gripping signal of terminal 1600, by Reason device 1601 carries out right-hand man's identification or prompt operation according to the gripping signal that pressure sensor 1613 acquires.Work as pressure sensor 1613 when being arranged in the lower layer of touch display screen 1605, is grasped by processor 1601 according to pressure of the user to touch display screen 1605 Make, realization controls the operability control on the interface UI.Operability control include button control, scroll bar control, At least one of icon control, menu control.

Fingerprint sensor 1614 is used to acquire the fingerprint of user, is collected by processor 1601 according to fingerprint sensor 1614 Fingerprint recognition user identity, alternatively, by fingerprint sensor 1614 according to the identity of collected fingerprint recognition user.Knowing Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation by processor 1601, which grasps Make to include solving lock screen, checking encryption information, downloading software, payment and change setting etc..Fingerprint sensor 1614 can be set Set the front, the back side or side of terminal 1600.When being provided with physical button or manufacturer Logo in terminal 1600, fingerprint sensor 1614 can integrate with physical button or manufacturer Logo.

Optical sensor 1615 is for acquiring ambient light intensity.In one embodiment, processor 1601 can be according to light The ambient light intensity that sensor 1615 acquires is learned, the display brightness of touch display screen 1605 is controlled.Specifically, work as ambient light intensity When higher, the display brightness of touch display screen 1605 is turned up；When ambient light intensity is lower, the aobvious of touch display screen 1605 is turned down Show brightness.In another embodiment, the ambient light intensity that processor 1601 can also be acquired according to optical sensor 1615, is moved The acquisition parameters of state adjustment CCD camera assembly 1606.

Proximity sensor 1616, also referred to as range sensor are generally arranged at the front panel of terminal 1600.Proximity sensor 1616 for acquiring the distance between the front of user Yu terminal 1600.In one embodiment, when proximity sensor 1616 is examined When measuring the distance between the front of user and terminal 1600 and gradually becoming smaller, by processor 1601 control touch display screen 1605 from Bright screen state is switched to breath screen state；When proximity sensor 1616 detect the distance between front of user and terminal 1600 by When gradual change is big, touch display screen 1605 is controlled by processor 1601 and is switched to bright screen state from breath screen state.

It, can be with it will be understood by those skilled in the art that the restriction of the not structure paired terminal 1600 of structure shown in Figure 16 Including than illustrating more or fewer components, perhaps combining certain components or being arranged using different components.

Terminal provided in an embodiment of the present invention is converted to collected audio data in video file recording process Text data, and the text data being converted to is shown at video record interface.The process makes one without user in advance Subtitle file, saves vast resources and manufacturing process is simpler.Subtitle can be added in video file recording process, and can will be literary Notebook data is aligned with video image, so that the recording process real-time of video is stronger.In addition, can also be according to selected by user Display format render in riotous profusion text, enrich video content, increase the interest of video.

The embodiment of the invention provides a kind of computer readable storage medium, at least one is stored in the storage medium Instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set Or described instruction collection is loaded by processor and is executed the method for recording to realize video file shown in Fig. 2.

Computer readable storage medium provided in an embodiment of the present invention will be collected in video file recording process Audio data is converted to text data, and the text data being converted to is shown at video record interface.The process is without using Family makes a subtitle file in advance, saves vast resources and manufacturing process is simpler.It can add in video file recording process Captioning, and text data can be aligned with video image, so that the recording process real-time of video is stronger.In addition, also In riotous profusion text can be rendered according to the selected display format of user, enrich video content, increase the interest of video.

It should be understood that the record device of video file provided by the above embodiment is in recorded video file, only with The division progress of above-mentioned each functional module can according to need and for example, in practical application by above-mentioned function distribution by not Same functional module is completed, i.e., the internal structure of the record device of video file is divided into different functional modules, to complete All or part of function described above.In addition, the record device and video file of video file provided by the above embodiment Method for recording embodiment belong to same design, specific implementation process is detailed in embodiment of the method, and which is not described herein again.

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of method for recording of video file, which is characterized in that the described method includes:

When receiving subtitle addition instruction, according to audio data collected during video record, the audio is obtained The text data that data conversion obtains；

The text data is shown on the video record interface；

2. the method according to claim 1, wherein described when receiving video record END instruction, generation Video file including text data, comprising:

Obtain the recording time of the corresponding audio data of the text data；

According to the recording time of the recording time of the audio data and every frame video image, to the text data and video figure As merging, the video file is obtained.

3. according to the method described in claim 2, it is characterized in that, the text data has data directory, the acquisition institute State the recording time of the corresponding audio data of text data, comprising:

By the product of the data directory of the text data and the first preset duration, as the corresponding audio number of the text data According to recording time.

4. according to the method described in claim 2, it is characterized in that, the text data corresponding audio data of obtaining Recording time, comprising:

Obtain the receiving time of the text data；

By the difference between the receiving time of the text data and third preset duration, when recording as the text data Between, the third preset duration is determined according to the sending time of the audio data and the receiving time of the text data.

5. according to the method described in claim 2, it is characterized in that, the recording time and every frame according to the audio data The recording time of video image merges the text data and video image, comprising:

For any frame video image, when according to the recording of the recording time of the video image and the text data not merged Between, the video image and the text data are merged.

6. the method according to claim 1, wherein the textual data for obtaining the audio data and being converted to According to, comprising:

During video record, resampling is carried out according to audio signal of the default sample rate to acquisition, obtains an at least frame sound Frequency evidence；

When it is described at least a frame audio data meets preset condition when, an at least frame audio data is converted, is obtained The text data.

7. according to the method described in claim 6, it is characterized in that, described when an at least frame audio data meets default item When part, an at least frame audio data is converted, the text data is obtained, comprising:

Obtain the volume value of every frame audio data；

When any frame audio data volume value be greater than specified threshold, store the audio data；

It is the sound of first preset duration to total duration when the total duration of the audio data stored reaches the first preset duration Frequency obtains the text data according to being converted.

8. according to the method described in claim 6, it is characterized in that, the method also includes:

When the volume value of the audio data is respectively less than the specified threshold in the second preset duration, Xiang Suoshu server is sent Heartbeat data, the heartbeat data are used to maintain the heartbeat between the server, and second preset duration is greater than described First preset duration.

9. the method according to claim 1, wherein the text data has data directory, the acquisition text After notebook data, further includes:

When the data directory has been locally stored, according to the text data, to the corresponding text data of the data directory into Row updates；

There is the data directory when locally not stored, stores the data directory and the text data.

10. the method according to claim 1, wherein the method also includes:

When receiving subtitle type selection command, show that at least one subtitle type option, every kind of subtitle type option are corresponding A kind of Subtitle Demonstration form；

When receiving text data acquisition instructions, execute the acquisition text data the step of.

11. according to the method described in claim 10, it is characterized in that, the subtitle type option includes first kind option, It is described to show the text data on the video record interface, comprising:

When selected word curtain type option is the first kind option, shown on the video record interface using barrage form Show the text data；Alternatively,

The subtitle type option includes Second Type option, and the Second Type option includes designated pictures, described described The text data is shown on video record interface, comprising:

When selected subtitle option is the Second Type option, institute is shown on the video record interface using barrage form State the text data moving process mobile with the designated pictures；Alternatively,

The subtitle type option includes third type option, described to show the textual data on the video record interface According to, comprising:

When selected subtitle option is third type option, the text in the text data is shown on the video record interface The moving process that word is moved around centered on predeterminated position.

12. method according to any one of claim 1 to 11, which is characterized in that described to receive video record knot When Shu Zhiling, after generation is including the video file of text data, further includes:

When receiving the instruction for previewing to the video file, show that preview interface, the preview interface include at least one Special efficacy element adds option, and every kind of special efficacy element addition option corresponds to a kind of special efficacy element；

When receiving the addition instruction of special efficacy element, the corresponding special efficacy element of selected element addition option is added to the video In file, and the video image for adding the special efficacy element is shown in the preview interface.

13. a kind of record device of video file, which is characterized in that described device includes:

Display module, for showing video record interface, the video record interface includes that subtitle adds during video record Add option；

Module is obtained, for when receiving subtitle addition instruction, according to audio data collected during video record, Obtain the text data that the audio data is converted to；

The display module, for showing the text data on the video record interface；

Generation module generates packet for generating the video file including text data when receiving video record END instruction Include the video file of text data.

14. a kind of terminal, which is characterized in that the terminal includes processor and memory, is stored at least in the memory One instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the generation Code collection or described instruction collection are loaded as the processor and are executed to realize the video as described in any one of claims 1 to 12 The method for recording of file.

15. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium A few Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or described Instruction set is loaded as processor and is executed the recording side to realize the video file as described in any one of claims 1 to 12 Method.