CN114584830A

CN114584830A - Method and device for processing video and household appliance

Info

Publication number: CN114584830A
Application number: CN202011393269.2A
Authority: CN
Inventors: 黄俊杰; 马鑫; 翟文彬; 丁立省; 王娜娜; 任维彬
Original assignee: Haier Smart Home Co Ltd; Qingdao Haier Multimedia Co Ltd
Current assignee: Haier Smart Home Co Ltd; Qingdao Haier Multimedia Co Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2022-06-03

Abstract

The application relates to the technical field of cooking, and discloses a method for processing videos, which comprises the following steps: analyzing the video to extract characteristic information, and identifying the characteristic information; under the condition that the identification result contains set content, marking time points according to the set content, and generating frame insertion video information based on the set content; and adding corresponding frame insertion video information at the time point corresponding to the mark. In the embodiment of the disclosure, the video characteristic information is analyzed, the characteristic information is identified, when the set content corresponding to the mark is included, the video is marked by dotting, the frame interpolation processing is performed based on the set content and the time point corresponding to the mark, the video is divided into a plurality of segments capable of being played continuously, a user can quickly check the content of the specified segment through the mark, the user can conveniently and accurately control the segment related to the video playing and cooking steps in the cooking process, and the cooking effect can be improved. The application also discloses a device and a household appliance for processing the video.

Description

Method and device for processing video and household appliance

Technical Field

The present application relates to the field of cooking technologies, and for example, to a method and an apparatus for processing a video, and a home appliance.

Background

At present, with the improvement of standard of living, people's pursuit to the food is also promoting, and more people are willing to experience the enjoyment of oneself culinary art food at home, share the culinary art process through the mode of video. In the process of making the gourmet, manuscripts or audios and videos such as a menu and a making method thereof are often required to be used as references, and the description of characters or voice is far from the description of the use habits of users and the video guidance experience is good. In the prior art, a user can play videos and control the playing progress through various playing devices, and the user is guided how to cook based on related videos of a menu.

In the process of implementing the embodiments of the present disclosure, it is found that at least the following problems exist in the related art:

the user can not accurately control the video playing segment to influence the cooking effect in the cooking process.

Disclosure of Invention

The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview nor is intended to identify key/critical elements or to delineate the scope of such embodiments but rather as a prelude to the more detailed description that is presented later.

The embodiment of the disclosure provides a method and a device for processing videos and household electrical appliance equipment, and aims to solve the technical problems that the videos do not have a uniform format due to different video sources, and the cooking effect is affected by the fact that a user cannot accurately control the segments of the video playing in the cooking process.

In some embodiments, the method comprises:

analyzing the video to extract characteristic information, and identifying the characteristic information;

when the identification result contains the set content, marking the time point according to the set content;

and performing frame interpolation processing on the video according to the mark.

In some embodiments, the apparatus comprises: a processor and a memory storing program instructions, the processor being configured to, when executing the program instructions, perform the method for processing video described above.

In some embodiments, the home device includes the apparatus for processing video described above.

The method, the device and the household appliance for processing the video provided by the embodiment of the disclosure can achieve the following technical effects:

the method comprises the steps of analyzing video characteristic information, identifying the characteristic information, dotting and marking the video when set content corresponding to the mark is included, performing frame interpolation processing based on the set content and the time point corresponding to the mark, dividing the video into a plurality of fragments capable of being played continuously, enabling a user to check the content of the specified fragments rapidly through the mark, enabling the user to control the fragments related to the video playing and cooking steps conveniently and accurately in the cooking process, and improving the cooking effect.

The foregoing general description and the following description are exemplary and explanatory only and are not restrictive of the application.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the accompanying drawings and not in limitation thereof, in which elements having the same reference numeral designations are shown as like elements and not in limitation thereof, and wherein:

fig. 1 is a schematic diagram of a method for processing video according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of marking a time point according to an embodiment of the present disclosure;

FIG. 3 is another schematic diagram of marking a time point provided by an embodiment of the present disclosure;

fig. 4 is an interaction interface for acquiring food material information according to the embodiment of the disclosure;

FIG. 5 is an interactive interface for obtaining tastes according to an embodiment of the present disclosure;

FIG. 6 is an interactive interface in the form of a get menu provided by embodiments of the present disclosure;

FIG. 7 is an interactive interface for text menu material display provided by an embodiment of the present disclosure;

fig. 8 is a schematic diagram of an apparatus for processing video according to an embodiment of the present disclosure.

Detailed Description

So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.

The terms "first," "second," and the like in the description and in the claims, and the above-described drawings of embodiments of the present disclosure, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure described herein may be made. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.

The term "plurality" means two or more, unless otherwise specified.

In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an or relationship. For example, A/B represents: a or B.

The term "and/or" is an associative relationship that describes objects, meaning that three relationships may exist. E.g., a and/or B, represents: a or B, or A and B.

In the cooking process, the video playing progress cannot be accurately controlled to guide the current cooking step, so that the cooked dishes are not ideal.

Fig. 1 is a schematic diagram of a method for processing video according to an embodiment of the present disclosure, the method including the following steps:

s101, analyzing the video, extracting the characteristic information, and identifying the characteristic information.

And S102, when the identification result contains the setting content, marking the time point according to the setting content and generating the frame inserting video information based on the setting content.

And S103, adding corresponding frame insertion video information at the time point corresponding to the mark.

In an embodiment of the present disclosure, a video processing apparatus for executing the method is a mobile terminal or an intelligent appliance with a video playing function, such as: a refrigerator, a television, or a range hood, etc. The embodiment of the present application mainly uses a refrigerator as an example to explain the scheme.

In step S101, the feature information includes one or more of subtitle information, sound information, and image information. The content contained in a video differs depending on the producer of the video. In some embodiments, the video producer dubs the video while adding subtitles. In some embodiments, the video producer may dub the video without adding subtitles. In some embodiments, a video producer may subtitle a video and dub the video with background music or without dubbing. Therefore, the feature information extracted from the video by the video processing apparatus differs according to the video creation.

In some embodiments, a video producer not only explains specific operational steps but also attaches background music when dubbing video. The video processing device mainly separates the audio track when acquiring the sound information, and acquires the sound information of the user.

In the case where the video includes different feature information in different embodiments, the manner of identifying the feature information is different.

In some embodiments, in step S101, identifying the feature information includes: and under the condition that the video comprises a plurality of kinds of characteristic information, respectively identifying the plurality of kinds of characteristic information. The video processing apparatus may correct the information having the difference in recognition based on the recognition result of the plurality of kinds of feature information after recognizing the plurality of kinds of feature information. For example: when the dubbing is not clear or has an error, the correct menu information can be determined in combination with the subtitle and the dubbing.

In some embodiments, in step S101, identifying the feature information comprises: in the case where the video includes a plurality of kinds of feature information, the feature information having the highest priority is identified. The video processing device only identifies one type of characteristic information, so that the video identification time can be shortened, and the video processing efficiency can be improved.

In some embodiments, when the user does not subtitle or dub the video, the video processing apparatus performs the identification of the setting content based on the image information.

In some embodiments, the priority of the subtitle information is higher than the priority of the sound information; the priority of the sound information is higher than that of the image information. The caption information shows the menu information more visually, and the identification efficiency is high, so that errors are not easy to occur. The sound information may reduce the recognition efficiency due to an accent of a video producer, a noisy environment, and the like.

In some embodiments, identifying the characteristic information includes one or more of: under the condition that the characteristic information is subtitle information, performing text recognition on the subtitle information and converting the subtitle information into text information; under the condition that the characteristic information is the voice information, performing semantic recognition on the voice information and converting the voice information into text information; and when the characteristic information is image information, performing image recognition on continuous multi-frame images to determine the action of the target object, and converting the action into text information.

In some embodiments, if the video does not contain subtitle information, the sound information in the video is parsed directly. The audio signal frequency and amplitude corresponding to different characters are different, so that the sound information is converted into text information through the analysis of the audio signal.

In some embodiments, the method further comprises: and correcting the text information, and taking the corrected text information as a recognition result. In particular, when a video processing apparatus recognizes audio information, there is a case where a grammar or a phrase is erroneous in the recognized text information due to an accent of a video producer, noisy environment, or the like. The video processing device can be used for correcting and correcting the text information, so that the accuracy of the recognition result can be improved.

In some embodiments, the setting content in step S102 includes a keyword and/or a setting action.

Optionally, the keywords include first step, second step, … …, nth step, then, 1, 2, … …, n, 2 minutes, 5 minutes, … …, n minutes, fire, oil, shut down, etc.

Optionally, the setting action comprises firing, oiling, salting, stir-frying, turning off fire and the like.

The video processing device determines the action as firing based on the multi-needle image information, records the recognized action as text information, and records the time point corresponding to the action.

In different embodiments, the manner of marking the time points is different according to the setting content.

Fig. 2 is a schematic diagram of marking a time point according to an embodiment of the present disclosure. And automatically acquiring a subtitle file from the video with the subtitle information, then retrieving the subtitle information, and converting the subtitle information into corresponding time position information after identifying the keyword information. In order to facilitate the use of the awakening words by the user, the corresponding keyword position can be quickly called and switched, frame interpolation pretreatment is carried out on the video, and the corresponding frame interpolation position is used as a software calling dotting identification position.

Optionally, the user may control the video playing by voice, option, key, etc. In the cooking video playing mode, the main voice interaction awakening words comprise a previous step, a next step, an nth step, firing, oiling, salting, playing, pausing, fast forwarding, fast rewinding and the like. The embodiment of the disclosure can effectively avoid the trouble that the user repeatedly operates again or switches the playing because the user cannot keep up with the speed of the video tutorial in the process of preparing the gourmet.

Fig. 3 is another schematic diagram of marking a time point provided by the embodiment of the present disclosure. In some embodiments, if the video does not contain subtitle information, the sound information in the video is parsed directly. The audio signal frequency and amplitude corresponding to different characters are different, so that the sound information is converted into text information through the analysis of the audio signal, then the text information is retrieved, and after the keyword information is identified, the original video is processed into a common audio segment and a keyword audio segment in a segmented mode. In order to facilitate the use of the awakening words by the user, the corresponding keyword position can be quickly called and switched, frame interpolation pretreatment is carried out on the video, and the corresponding frame interpolation position is used as a software calling dotting identification position.

In some embodiments, in step S102, generating the frame-inserted video information based on the setting content includes: rendering text information corresponding to the keywords and/or the set actions to generate video text content; and acquiring preset image information, and overlapping the preset image information and the video text content to generate frame insertion video information. The user can control the video playing progress based on the keywords corresponding to the frame insertion information.

In some embodiments, one or more of the font, texture, font size, style, and color of the textual information is rendered.

In different embodiments, the predetermined image information is determined in different ways.

Optionally, the preset image information is determined according to the setting action included in the step, so that the user can determine the specific operation information corresponding to the current step more intuitively. For example: the preset image information corresponding to the firing action is a knob image, an image marked with 'on' or an image marked with 'on'. The preset image information corresponding to the oiling action comprises an oilcan or oil drops.

Optionally, the preset image information is determined according to dishes, for example: and if the prepared dish is fried eggs with tomatoes, the preset image information is pictures of the eggs fried with the tomatoes, or the preset image information is pictures of raw materials, namely tomatoes and eggs.

Optionally, the preset image information is a solid background image. The color of the background picture is determined according to the font color, and the color difference between the background picture color and the font color is larger than a set value, so that a user can clearly see the specific content in the frame insertion video information.

In the embodiment of the disclosure, the video characteristic information is analyzed, the characteristic information is identified, when the set content corresponding to the mark is included, the video is marked by dotting, the frame interpolation processing is performed based on the set content and the time point corresponding to the mark, the video is divided into a plurality of segments capable of being played continuously, a user can quickly check the content of the specified segment through the mark, the user can conveniently and accurately control the segment of the video playing related to the cooking step in the cooking process, and the cooking effect can be improved.

In step S101, the video processing apparatus determines a required video before acquiring a video related to a recipe, specifically, the video is determined according to the food material information.

The manner of obtaining the food material information is different in different embodiments.

In some embodiments, the video processing device actively acquires the image information of the food materials in the refrigerator through an intelligent recognition mode to analyze the image information to acquire the existing food materials, and carries out video recommendation according to the existing food materials to ensure that the cooking of dishes of a user can be smoothly carried out. When the video processing device is a refrigerator, the food material image in the refrigerator is acquired by an image acquisition device of the refrigerator. When the video processing device is a device other than the refrigerator, the image acquisition device carried by the refrigerator acquires the food material image in the refrigerator, and acquires the food material image sent by the refrigerator.

In some embodiments, the video processing device obtains the food material information through a manual input mode. The video processing device provides dish options, and the user manually inputs the existing food materials at home. The manual input mode brings a selection space for the user, the user can input food material information according to favorite dishes, and the video processing device carries out related recommendation according to the food material information input by the user, so that the recommended menu is ensured to better meet the requirements of the user.

Fig. 4 is an interaction interface for acquiring food material information according to an embodiment of the disclosure. The user can select to add or delete dishes via the "+" and "-" options on the left side of the screen. And dish information is displayed on the right side of the screen. Optionally, the dish information is displayed in a list or thumbnail. The user can establish different food material libraries for naming and individual storage to meet the cooking requirements of multiple users, such as the 'basket of my home' shown in fig. 4.

Further, the video processing device also provides a selection option of the taste, and fig. 5 is an interactive interface for obtaining the taste provided by the embodiment of the disclosure. And the video processing device carries out recommendation of related menu according to the taste selected by the user. Optionally, the taste options include sweet, sweet and spicy, medium spicy, etc. According to the tastes of different dishes, the user can define the taste by himself, so that the video processing device can recommend the menu meeting the requirements of the user. For example: slightly spicy, abnormal spicy, etc.

FIG. 6 is an interactive interface in the form of a get menu provided by embodiments of the present disclosure. The video processing device can acquire the video related to the menu and also can acquire the text menu, and optionally, the text menu is a menu in a pure text menu or a combined image-text menu. The video processing device provides various menu form options to meet the watching habits of the user, and the menu making steps are displayed more visually through the text menu, so that the menu selection efficiency of the user is improved.

In some embodiments, when the menu is a text menu, the menu is displayed by a paging display method. When the menu is searched, the menu is directly referred to if the menu is already in a page type. If the menu is not in a page type, the menu is displayed after being optimized. The processing is performed according to the text information in the above embodiment. The method specifically comprises the following steps: searching step-by-step keywords, and dotting and marking according to the keywords; marking identification positions according to the line break marks if the keywords are not searched; wherein the keywords comprise the nth step, the serial number n, the next step and the like.

Fig. 7 is an interactive interface for displaying a text menu material according to an embodiment of the present disclosure. After the menu paging is finished, the text information of the menu of a certain step or a certain page is displayed on the left side of the middle part of the screen, corresponding picture information is displayed on the right side of the middle part of the screen, and the name of the menu is displayed at the title position of the upper part of the screen. Optionally, when the menu is played, the menu is controlled to be switched and displayed in a voice mode, an option mode or a key mode and the like. And switching and displaying through the previous page, the next page, the nth page, the previous step, the next step, the nth step and the like.

In some embodiments, a music accompanying function is added in the process of displaying the text menu, so that the music can be played while the menu is displayed, and the user experience is improved. As shown in fig. 7, the user turns on the accompaniment function through the music option, and the system randomly plays the background music. In some embodiments, the user may select music to play by voice, such as: i want to listen to music, i want to listen to, etc.

The disclosed embodiments also provide an apparatus for processing video, comprising a processor and a memory storing program instructions, the processor being configured to perform the method for processing video as described above when executing the program instructions.

As shown in fig. 8, an apparatus for processing video according to an embodiment of the present disclosure includes a processor (processor)800 and a memory (memory) 801. Optionally, the apparatus may also include a Communication Interface 802 and a bus 803. The processor 800, the communication interface 802, and the memory 801 may communicate with each other via a bus 803. Communication interface 802 may be used for information transfer. The processor 800 may call logic instructions in the memory 801 to perform the method for processing video of the above-described embodiment.

In addition, the logic instructions in the memory 801 may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 801 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 800 executes functional applications and data processing, i.e., implements the method for processing video in the above-described embodiments, by executing program instructions/modules stored in the memory 801.

The memory 801 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory 801 may include a high-speed random access memory, and may also include a nonvolatile memory.

The embodiment of the disclosure also provides household electrical appliance equipment which comprises the device for processing the video. In some embodiments, the home appliance is a refrigerator, an air conditioner, a television, a range hood, or the like, which can play videos.

Embodiments of the present disclosure provide a computer-readable storage medium storing computer-executable instructions configured to perform the above-described method for processing video.

Embodiments of the present disclosure provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method for … described above.

The computer-readable storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.

The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes one or more instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium comprising: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes, and may also be a transient storage medium.

The above description and drawings sufficiently illustrate embodiments of the disclosure to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. Furthermore, the words used in the specification are words of description only and are not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element. In this document, each embodiment may be described with emphasis on differences from other embodiments, and the same and similar parts between the respective embodiments may be referred to each other. For methods, products, etc. of the embodiment disclosures, reference may be made to the description of the method section for relevance if it corresponds to the method section of the embodiment disclosure.

Those of skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software may depend upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments. It can be clearly understood by the skilled person that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments disclosed herein, the disclosed methods, products (including but not limited to devices, apparatuses, etc.) may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be merely a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to implement the present embodiment. In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than disclosed in the description, and sometimes there is no specific order between different operations or steps. For example, two sequential operations or steps may in fact be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims

1. A method for processing video, comprising:

under the condition that the identification result contains set content, marking time points according to the set content, and generating frame insertion video information based on the set content;

and adding corresponding frame insertion video information at the time point corresponding to the mark.

2. The method of claim 1, wherein the feature information comprises one or more of subtitle information, sound information, and image information.

3. The method of claim 2, wherein identifying feature information comprises:

under the condition that the video comprises a plurality of kinds of characteristic information, respectively identifying the plurality of kinds of characteristic information; or,

in the case where the video includes a plurality of kinds of feature information, the feature information with the highest priority is identified.

4. The method of claim 3, wherein the subtitle information has a higher priority than the sound information; the priority of the sound information is higher than that of the image information.

5. The method of claim 2, wherein identifying the characteristic information comprises one or more of:

under the condition that the characteristic information is subtitle information, performing text recognition on the subtitle information and converting the subtitle information into text information;

under the condition that the characteristic information is the voice information, performing semantic recognition on the voice information and converting the voice information into text information;

when the feature information is image information, the operation of identifying the target object is performed on the continuous multi-frame images, and the multi-frame images are converted into text information.

6. The method according to any one of claims 1 to 5, wherein the setting content comprises a keyword and/or a setting action.

7. The method of claim 6, wherein generating the inter-frame video information based on the setting content comprises:

rendering the text information corresponding to the keywords and/or the set action to generate video text content;

and acquiring preset image information, and overlapping the preset image information and the video text content to generate frame insertion video information.

8. The method of claim 7, wherein one or more of a font, a texture, a font size, a style, and a color of the text information is rendered.

9. An apparatus for processing video, comprising a processor and a memory having stored thereon program instructions, wherein the processor is configured to perform the method for processing video according to any one of claims 1 to 8 when executing the program instructions.

10. An electric household appliance comprising a device for processing video according to claim 9.