CN113938707A - Video processing method, recording and playing box and computer readable storage medium - Google Patents

Video processing method, recording and playing box and computer readable storage medium Download PDF

Info

Publication number
CN113938707A
CN113938707A CN202111189977.9A CN202111189977A CN113938707A CN 113938707 A CN113938707 A CN 113938707A CN 202111189977 A CN202111189977 A CN 202111189977A CN 113938707 A CN113938707 A CN 113938707A
Authority
CN
China
Prior art keywords
video
audio
result
processed
auditing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111189977.9A
Other languages
Chinese (zh)
Inventor
张�浩
金立平
陈善康
张鑫
余佳喜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Skyworth RGB Electronics Co Ltd
Original Assignee
Shenzhen Skyworth RGB Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Skyworth RGB Electronics Co Ltd filed Critical Shenzhen Skyworth RGB Electronics Co Ltd
Priority to CN202111189977.9A priority Critical patent/CN113938707A/en
Publication of CN113938707A publication Critical patent/CN113938707A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2181Source of audio or video content, e.g. local disk arrays comprising remotely distributed storage units, e.g. when movies are replicated over a plurality of video servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • H04N21/2368Multiplexing of audio and video streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/254Management at additional data server, e.g. shopping server, rights management server

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a video processing method, a recording and playing box and a computer readable storage medium, wherein the video processing method comprises the following steps: acquiring a video source to be audited in real time, and intercepting a video segment in the video source to be audited to obtain a local video; performing video transcoding and audio-video separation on the local video to obtain a video to be processed and an audio to be processed; and auditing the to-be-processed video and the to-be-processed audio through artificial intelligence according to a preset auditing standard, further processing the to-be-processed audio after the auditing is passed to obtain an audio processing result, synthesizing the audio processing result and the local video to obtain a target video when the audio processing result is matched with the local video, and transcoding and distributing the target video to a cloud server according to the received playing requirement. The safety of the video content of the education system is guaranteed, the burden of the cloud server is relieved, and the operation cost is effectively reduced.

Description

Video processing method, recording and playing box and computer readable storage medium
Technical Field
The present invention relates to the field of wireless technologies, and in particular, to a video processing method, a recording and playing box, and a computer-readable storage medium.
Background
Currently, there are many recorded boxes in the market for educational systems that are mainly aimed at teacher live recording and uploading recorded courses to a server or cloud server. However, the direct uploading of the unapproved content to the server has the following disadvantages: the first is that recorded and broadcast contents cannot be audited, and contents which do not accord with national security or education are easily broadcast; secondly, a large-capacity unprocessed video source is uploaded to a cloud server, so that bandwidth resources are wasted, and high-cost operation cost is brought; thirdly, recorded and broadcast courses need to be subjected to auditing of a main server, video and subtitle synthesis, course knowledge combing and the like, and server burden and cost are increased. In addition, the recording and broadcasting box has a single function, the function of online live broadcasting cannot be realized, and real-time live broadcasting audit cannot be performed even if online live broadcasting is performed.
Disclosure of Invention
The invention mainly aims to provide a video processing method, a recording and playing box and a computer readable storage medium, aiming at solving the problem of effectively reducing the operation cost on the premise of ensuring the safety of video contents of an education system.
In order to achieve the above object, the present invention provides a video processing method, including the steps of:
acquiring a video source to be audited in real time, and intercepting a video segment in the video source to be audited to obtain a local video;
performing video transcoding and audio-video separation on the local video to obtain a video to be processed and an audio to be processed;
auditing the to-be-processed video and the to-be-processed audio through artificial intelligence according to a preset auditing standard to obtain an auditing result;
if the verification result is that verification is passed, further processing the audio to be processed to obtain an audio processing result, and verifying whether the audio processing result is matched with the local video according to a preset verification algorithm;
and if the audio processing result is matched with the local video, synthesizing the audio processing result and the local video to obtain a target video, and transcoding and distributing the target video to a cloud server according to the received playing requirement.
Optionally, the video source to be audited includes a picture frame and text information, and the step of obtaining the video to be processed and the audio to be processed includes:
performing frame truncation on the video to be processed to acquire a plurality of picture frames of the video to be processed;
and performing text conversion on the audio to be processed to acquire text information of the audio to be processed.
Optionally, the preset audit standard includes a preset image audit standard and a preset text audit standard, the audit result includes an image audit result and a text audit result, and the step of auditing the to-be-processed video and the to-be-processed audio according to the preset audit standard through artificial intelligence to obtain the audit result includes:
performing image auditing on the picture frame according to a preset image auditing standard through artificial intelligence to obtain an image auditing result;
and performing word auditing on the text information according to a preset character auditing standard through artificial intelligence to obtain a text auditing result.
Optionally, the step of performing image review on the picture frame includes:
acquiring a plurality of image characteristics contained in the picture frame;
classifying the plurality of image features;
and performing image auditing on the classified image characteristics through a corresponding preset recognition algorithm.
Optionally, the step of obtaining the audit result includes:
if the image audit result and the text audit result both pass, the audit result is that the audit is passed;
and if the image audit result and/or the text audit result are/is not passed, reporting the non-passed audit result to a cloud server.
Optionally, the step of further processing the audio to be processed to obtain an audio processing result includes:
uploading the audio to be processed to a cloud server so that the cloud server completes voice calibration of the audio to be processed to obtain an audio processing result;
and receiving the audio processing result returned by the cloud server.
Optionally, the step of verifying whether the audio processing result is matched with the local video according to a preset verification algorithm includes:
verifying whether the audio processing result is aligned with an audio time axis in the local video according to a preset verification algorithm through the artificial intelligence;
and if the audio processing result is aligned with the audio time axis in the local video, determining that the audio processing result is matched with the local video.
Optionally, the step of verifying, by the artificial intelligence, whether the audio processing result is aligned with an audio time axis in the local video according to a preset verification algorithm includes:
and if the audio processing result is not aligned with the audio time axis in the local video, adjusting the audio processing result to enable the audio processing result to be matched with the local video.
In addition, to achieve the above object, the present invention further provides a recording and playing box, including: a memory, a processor and a video processing program stored on the memory and executable on the processor, the video processing program when executed by the processor implementing the steps of the video processing method as claimed in any one of the above.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a video processing program which, when executed by a processor, implements the steps of the video processing method as described in any one of the above.
According to the video processing method provided by the invention, the video source to be audited is obtained in real time, the video segment in the video source to be audited is intercepted, the local video is obtained, the video transcoding and the audio-video separation are carried out on the local video, the video to be processed and the audio to be processed are obtained, the video to be processed and the audio to be processed are audited through artificial intelligence according to the preset auditing standard, the auditing result is obtained, and the safety of the video content of an education system is ensured; if the auditing result is that the auditing is passed, the audio to be processed is further processed to obtain an audio processing result, whether the audio processing result is matched with the local video is verified according to a preset verifying algorithm, if the audio processing result is matched with the local video, the audio processing result and the local video are synthesized to obtain a target video, the target video is transcoded and distributed to a cloud server according to the received playing requirement, the backup of the target video is realized, the work of consuming calculation power such as subtitle synthesis and the like is completed locally, the burden of the cloud server is reduced, and the operation cost is effectively reduced.
Drawings
Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a diagram of a hardware system architecture to which an embodiment of the invention relates;
FIG. 3 is a flowchart illustrating a video processing method according to a first embodiment of the present invention;
fig. 4 is a flowchart illustrating a video processing method according to a second embodiment of the invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: a video processing method, the video processing method comprising the steps of:
acquiring a video source to be audited in real time, and intercepting a video segment in the video source to be audited to obtain a local video;
performing video transcoding and audio-video separation on the local video to obtain a video to be processed and an audio to be processed;
auditing the to-be-processed video and the to-be-processed audio through artificial intelligence according to a preset auditing standard to obtain an auditing result;
if the verification result is that verification is passed, further processing the audio to be processed to obtain an audio processing result, and verifying whether the audio processing result is matched with the local video according to a preset verification algorithm;
and if the audio processing result is matched with the local video, synthesizing the audio processing result and the local video to obtain a target video, and transcoding and distributing the target video to a cloud server according to the received playing requirement.
Currently, there are many recorded boxes in the market for educational systems that are mainly aimed at teacher live recording and uploading recorded courses to a server or cloud server. However, the direct uploading of the unapproved content to the server has the following disadvantages: the first is that recorded and broadcast contents cannot be audited, and contents which do not accord with national security or education are easily broadcast; secondly, a large-capacity unprocessed video source is uploaded to a cloud server, so that bandwidth resources are wasted, and high-cost operation cost is brought; thirdly, recorded and broadcast courses need to be subjected to auditing of a main server, video and subtitle synthesis, course knowledge combing and the like, and server burden and cost are increased. In addition, the recording and broadcasting box has a single function, the function of online live broadcasting cannot be realized, and real-time live broadcasting audit cannot be performed even if online live broadcasting is performed.
Based on the above problems, the present invention provides a solution for a miniaturized 5G (5th Generation Mobile Communication Technology, fifth Generation Mobile Communication Technology) edge computing service, which can realize fast video uploading or online high definition live course broadcasting by using high bandwidth and low delay of a 5G module. The system has a computing system with a computing power up to 16TOPS INT8 dual-core NNIE @840MHz neural network acceleration engine, can quickly analyze character behaviors of online live broadcast and recorded broadcast videos, realizes the functions of converting voice into AI (Artificial Intelligence) subtitles automatic embedding system and the like through Deep Peak2 (voice analysis model) end-to-end modeling, and realizes a low-cost, efficient and safe education system.
The invention provides a video processing method, which comprises the steps of obtaining a video source to be audited in real time, intercepting a video segment in the video source to be audited to obtain a local video, carrying out video transcoding and audio-video separation on the local video to obtain a video to be processed and an audio to be processed, auditing the video to be processed and the audio to be processed according to a preset auditing standard through artificial intelligence to obtain an auditing result, and ensuring the safety of video contents of an education system; if the auditing result is that the auditing is passed, the audio to be processed is further processed to obtain an audio processing result, whether the audio processing result is matched with the local video is verified according to a preset verifying algorithm, if the audio processing result is matched with the local video, the audio processing result and the local video are synthesized to obtain a target video, the target video is transcoded and distributed to a cloud server according to the received playing requirement, the backup of the target video is realized, the work of consuming calculation power such as subtitle synthesis and the like is completed locally, the burden of the cloud server is reduced, and the operation cost is effectively reduced.
As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.
The terminal of the embodiment of the invention can be a recording and broadcasting box, and can also be intelligent terminal equipment with a display function, a video playing function, a data processing function and a network connection function, such as a PC, an intelligent mobile phone, a tablet computer, a portable computer and the like.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a video processing program.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the video processing program stored in the memory 1005 and perform the following operations:
acquiring a video source to be audited in real time, and intercepting a video segment in the video source to be audited to obtain a local video;
performing video transcoding and audio-video separation on the local video to obtain a video to be processed and an audio to be processed;
auditing the to-be-processed video and the to-be-processed audio through artificial intelligence according to a preset auditing standard to obtain an auditing result;
if the verification result is that verification is passed, further processing the audio to be processed to obtain an audio processing result, and verifying whether the audio processing result is matched with the local video according to a preset verification algorithm;
and if the audio processing result is matched with the local video, synthesizing the audio processing result and the local video to obtain a target video, and transcoding and distributing the target video to a cloud server according to the received playing requirement.
Further, the processor 1001 may call a video processing program stored in the memory 1005, and also perform the following operations:
the video source to be audited comprises a picture frame and text information, and the step of obtaining the video to be processed and the audio to be processed comprises the following steps:
performing frame truncation on the video to be processed to acquire a plurality of picture frames of the video to be processed;
and performing text conversion on the audio to be processed to acquire text information of the audio to be processed.
Further, the processor 1001 may call a video processing program stored in the memory 1005, and also perform the following operations:
the preset auditing standard comprises a preset image auditing standard and a preset character auditing standard, the auditing result comprises an image auditing result and a text auditing result, and the step of auditing the to-be-processed video and the to-be-processed audio according to the preset auditing standard through artificial intelligence to obtain the auditing result comprises the following steps:
performing image auditing on the picture frame according to a preset image auditing standard through artificial intelligence to obtain an image auditing result;
and performing word auditing on the text information according to a preset character auditing standard through artificial intelligence to obtain a text auditing result.
Further, the processor 1001 may call a video processing program stored in the memory 1005, and also perform the following operations:
the step of performing image review on the picture frame comprises the following steps:
acquiring a plurality of image characteristics contained in the picture frame;
classifying the plurality of image features;
and performing image auditing on the classified image characteristics through a corresponding preset recognition algorithm.
Further, the processor 1001 may call a video processing program stored in the memory 1005, and also perform the following operations:
if the image audit result and the text audit result both pass, the audit result is that the audit is passed;
and if the image audit result and/or the text audit result are/is not passed, reporting the non-passed audit result to a cloud server.
Further, the processor 1001 may call a video processing program stored in the memory 1005, and also perform the following operations:
the step of further processing the audio to be processed to obtain an audio processing result comprises:
uploading the audio to be processed to a cloud server so that the cloud server completes voice calibration of the audio to be processed to obtain an audio processing result;
and receiving the audio processing result returned by the cloud server.
Further, the processor 1001 may call a video processing program stored in the memory 1005, and also perform the following operations:
the step of verifying whether the audio processing result is matched with the local video according to a preset verification algorithm comprises the following steps:
verifying whether the audio processing result is aligned with an audio time axis in the local video according to a preset verification algorithm through the artificial intelligence;
and if the audio processing result is aligned with the audio time axis in the local video, determining that the audio processing result is matched with the local video.
Further, the processor 1001 may call a video processing program stored in the memory 1005, and also perform the following operations:
and if the audio processing result is not aligned with the audio time axis in the local video, adjusting the audio processing result to enable the audio processing result to be matched with the local video.
As shown in fig. 2, fig. 2 is a schematic diagram of a hardware system architecture according to an embodiment of the present invention.
The hardware system architecture of the video processing method comprises a four-core SVP platform, a high-capacity memory, an AI computing power module, a hardware coding and decoding module, a power supply adaptation module and a 5G cellular module.
The quad-core SVP platform provides a hardened 6-Dof digital anti-shake under video recording supporting 8K (resolution 7,680 × 4,320 pixels (16:9) (about 3300 ten thousand pixel images Per frame)) @30fps (Frames Per Second, Frames Per Second transmitted)/4K 120 fps. The dual-core A73 and the dual-core A53 are integrated, and the power consumption and the starting time are balanced by an original large and small core architecture and a dual-operating system. The method provides efficient and rich computing resources, supports various applications such as AI voice to caption conversion, teacher behavior analysis, video content audit and the like, and provides an excellent hardware platform for edge computing.
The power supply module provides 12V input to 5V, 3.3V and other outputs, and provides stable power supply support for each module;
the hardware coding and decoding module is mainly used for transcoding ultrahigh definition 8K @30fps and the like, providing different video format outputs for online live broadcast and providing an AI subtitle video synthesis function;
the storage device takes recorded and broadcast videos and other teaching videos as a storage basis;
the AI computing power module mainly provides additional computing power and improves the AI computing power of the whole system.
Referring to fig. 3, a first embodiment of the present invention provides a video processing method, including the steps of:
step S10, acquiring a video source to be audited in real time, and intercepting a video segment in the video source to be audited to obtain a local video;
it should be noted that, in this embodiment, the execution main body may be a recording and playing box, the video source may be a recording and playing video or a live video, the video source may be an 8K high-definition source video or a high-definition video with other resolutions, such as a 4K high-definition source video, a 2K high-definition source video, and the like, which is not limited in this embodiment; the video segment capturing mode can be random capturing, and can also be capturing according to the settable video time length required to be captured and the time interval for executing the capturing action, and the video segment obtained after capturing is the local video.
In specific implementation, the recorded broadcast box can acquire the recorded broadcast video or the live broadcast video in real time, and intercept a video segment in the recorded broadcast video or the live broadcast video to obtain a local video.
Step S20, performing video transcoding and audio-video separation on the local video to obtain a video to be processed and an audio to be processed;
it can be understood that, on the premise that the video source is an 8K high-definition source video, the local video is also an 8K high-definition video, so that it needs to be subjected to video transcoding first, that is, a video code stream that has been subjected to compression coding is converted into another video code stream, so as to adapt to different network bandwidths, different terminal processing capabilities and different user requirements. Transcoding is essentially a process of decoding first and then encoding, so the code stream before and after conversion may or may not conform to the same video encoding standard. And then Audio and video separation is carried out, and the voice is converted into an MP4(Moving Picture Experts Group Audio Layer IV) format by a video voice extraction technology, so as to obtain the video to be processed, namely the video format file after transcoding separation and the Audio to be processed, namely the Audio MP4 format file after transcoding separation. And taking the video format file and the audio MP4 format file as video sources to be audited.
The advantage of audio-video separation is that the computational load of local AI (Artificial Intelligence) can be reduced, and meanwhile, the computational load can also be reduced through the lottery audit in a certain time period. Meanwhile, local AI can perfect and upload the database to the cloud through fast learning, and more accurate deep audit is provided for subsequent audit.
In this embodiment, the video source to be audited in step S10 includes a picture frame and text information, and the step of obtaining the video to be processed and the audio to be processed in step S20 includes:
step a10, performing frame truncation on the video to be processed to acquire a plurality of picture frames of the video to be processed;
step a11, performing text conversion on the audio to be processed to obtain text information of the audio to be processed.
It can be understood that the frame cutting is performed on the video data, and only the plurality of picture frames of the video data are audited, so that the auditing cost can be greatly reduced and the processing efficiency of video auditing can be improved on the basis of improving the accuracy rate of the auditing result. The dimension information corresponding to each task can be flexibly configured, the auditing dimension has high flexibility and expandability, and diversified auditing requirements can be met. By converting the audio content into the text content, the auditing can be performed more intuitively.
Step S30, auditing the video to be processed and the audio to be processed through artificial intelligence according to a preset auditing standard to obtain an auditing result;
it should be noted that, in this embodiment, an AI audio and video auditing technology is used, and a local AI algorithm engine is used to perform audio and video separation on an 8K source video and then perform content auditing to obtain an auditing result.
In this embodiment, the preset audit standard in step S30 includes a preset image audit standard and a preset text audit standard, the audit result includes an image audit result and a text audit result, and step S30 includes:
b10, performing image review on the picture frame through artificial intelligence according to a preset image review standard to obtain an image review result, wherein the step of performing image review on the picture frame comprises:
acquiring a plurality of image characteristics contained in the picture frame;
classifying the plurality of image features;
performing image verification on the classified image features through a corresponding preset recognition algorithm;
and b20, performing word auditing on the text information according to a preset character auditing standard through artificial intelligence to obtain a text auditing result.
It should be noted that the image feature classification may be a pornographic picture, a political task, a sensitive Character, and the like, the image examination is expressed as AI behavior analysis, the preset Recognition algorithm may be a classification algorithm, a target detection face Recognition algorithm, an OCR (Optical Character Recognition) Recognition algorithm, and the like, and the word examination is expressed as AI sensitive Character examination.
In the specific implementation, a classification algorithm is adopted to identify whether the picture is a pornographic picture, a target is adopted to detect whether the face is a political figure, selective framing image extraction OCR (optical character recognition) is carried out on video content, character content on the picture is identified through the OCR, and then the character content passes through a sensitive word + semantic analysis model, so that whether the content is illegal is identified, an OCR audit result is obtained, and the image audit is completed; based on the most common text contents, a mode of combining sensitive words and a semantic analysis model can be adopted to match and identify whether the contents violate rules, for example, if the sensitive words exist, the contents are regarded as violations, and the semantic analysis result is related contents such as pornographic contents, bloody fishy violence contents, sensitive political topics and the like, and is also regarded as violations, so that the text audit result is obtained, and the text audit is completed.
In this embodiment, step S30 includes the following steps:
step b30, if the image audit result and the text audit result both pass, the audit result is that the audit passes;
and b31, if the image audit result and/or the text audit result are/is not passed, reporting the non-passed audit result to a cloud server.
It can be understood that the review result of the video source can be considered as the review pass only when the image review result that represents the video review pass and the text review result that represents the audio review pass both pass.
In a specific implementation, if the image audit result is passed but the text audit result is not passed, the audit result of the video source is considered as not passed; similarly, if the image audit result is not passed but the text audit result is passed, the video source audit result is considered as not passed; and if the image audit result is not passed and the text audit result is not passed, the audit result of the video source is considered as not passed. Under the condition that the audit result is considered to be not passed, the non-passed audit result needs to be reported to the cloud server through 5G, so that the cloud server can adjust the current live broadcast content or recorded broadcast content in real time.
And step S40, if the audit result is that the audit is passed, further processing the audio to be processed to obtain an audio processing result, and verifying whether the audio processing result is matched with the local video according to a preset verification algorithm.
It should be noted that the audio processing result is represented in a subtitle form, and the preset verification algorithm is an algorithm for performing verification and alignment on subtitles and video and audio based on an AI algorithm system.
It can be understood that if the audit result is that the audit is passed, it indicates that the to-be-processed video and the to-be-processed audio do not contain the violation problem, the to-be-processed audio is converted into the subtitle by using an AI voice to AI subtitle synthesis technology, the generated subtitle passes through a local encoding and decoding system and an AI computation system, and after the voice time and video audio alignment algorithm is checked, whether the subtitle is matched with the video can be known according to the check result.
Step S50, if the audio processing result is matched with the local video, synthesizing the audio processing result and the local video to obtain a target video, and transcoding and distributing the target video to a cloud server according to the received playing requirement.
It can be understood that, if the subtitle matches the video audio, the AI subtitle is synthesized into the source video to obtain the target video (i.e., the recorded video or the live video that can be played normally). And finally, rapidly releasing, uploading and transcoding the recorded broadcast video or the live broadcast video according to the playing requirements of different users, and distributing the video to the cloud server and the user terminal with the playing requirement. The work of subtitle synthesis and the like which need to consume calculation power is completed locally, the burden of the cloud server is reduced, and the operation cost is effectively reduced.
It should be noted that after the target video is obtained, the target video is also uploaded to a local server to form a backup locally, and the backup can be called for use when needed, so that the problem that the unprocessed video source needs to be processed again after the cloud server loses a file, and unnecessary time is wasted is avoided.
In this embodiment, a video processing method is provided, in which a video source is obtained in real time, a video segment in the video source is captured to obtain a local video, video transcoding and audio-video separation are performed on the local video to obtain a to-be-processed video and a to-be-processed audio, the workload of a local AI is reduced, the to-be-processed video is framed to obtain a plurality of frame frames of the to-be-processed video, text conversion is performed on the to-be-processed audio to obtain text information of the to-be-processed audio, AI behavior analysis is performed on the frame frames through the local AI to obtain an OCR auditing result, AI sensitive word auditing is performed on the text information through the local AI to obtain a text auditing result, and meanwhile, the local AI can also perform fast learning to perfect and upload a database to the cloud to provide more and more accurate deep auditing for subsequent auditing, the safety of the video content of the education system is ensured; the target video is obtained based on the video source and uploaded to the local server, so that the backup of the target video is realized, the situation that the unprocessed video source needs to be processed again after the file is lost by the cloud server, and unnecessary time is wasted is avoided; and transcoding and distributing the target video to the cloud server according to the received playing requirement, and completing the work of synthesizing subtitles and the like which needs to consume calculation power locally, so that the burden of the cloud server is reduced, and the operation cost is effectively reduced.
Further, referring to fig. 4, a second embodiment of the video processing method according to the present invention is proposed based on the above-mentioned embodiment shown in fig. 3.
The step of further processing the audio to be processed in step S40 to obtain an audio processing result includes:
step K10, uploading the audio to be processed to a cloud server, so that the cloud server completes voice calibration of the audio to be processed, and an audio processing result is obtained;
and K20, receiving the audio processing result returned by the cloud server.
The step of verifying whether the audio processing result is matched with the local video according to a preset verification algorithm in step S40 includes:
k30, verifying whether the audio processing result is aligned with the audio time axis in the local video according to a preset verification algorithm through the artificial intelligence;
step K40, if the audio processing result is aligned with the audio time axis in the local video, determining that the audio processing result is matched with the local video;
and step K41, if the audio processing result is not aligned with the audio time axis in the local video, adjusting the audio processing result to make the audio processing result match with the local video.
It should be noted that after the audit is passed, an AI voice to AI subtitle synthesis technology is used to convert the audio to be processed into an MP4 format, voice to text mainly passes through a video voice extraction technology, end-to-end modeling based on Deep Peak2 is performed locally in the system, data training is performed for more than 10 ten thousand hours, multi-sampling rate and multi-scene acoustic modeling is performed, simultaneously, a checked voice text is quickly uploaded to a cloud server through a 5G cellular technology to perform secondary calibration, and the generated subtitles are transmitted back to the local, that is, the audio processing result, so as to improve the accuracy of video voice recognition.
It can be understood that the generated subtitles pass through a local encoding and decoding system and an AI algorithm system, and after the alignment algorithm between the voice time and the video and audio is checked, the AI subtitles are synthesized into the video to be processed to obtain the processed video, i.e. the target video. If the caption is aligned with the audio time axis and the caption is matched with the audio, the check is free from problems, if the caption is not aligned with the audio time axis, the caption is not matched with the audio, and at the moment, the caption needs to be adjusted so that the caption can be aligned with the audio time axis.
In this embodiment, a video processing method is provided, in which a video source is obtained in real time, a video segment in the video source is captured to obtain a local video, video transcoding and audio-video separation are performed on the local video to obtain a to-be-processed video and a to-be-processed audio, the workload of a local AI is reduced, the to-be-processed video is framed to obtain a plurality of frame frames of the to-be-processed video, text conversion is performed on the to-be-processed audio to obtain text information of the to-be-processed audio, AI behavior analysis is performed on the frame frames through the local AI to obtain an OCR auditing result, AI sensitive word auditing is performed on the text information through the local AI to obtain a text auditing result, and meanwhile, the local AI can also perform fast learning to perfect and upload a database to the cloud to provide more and more accurate deep auditing for subsequent auditing, the safety of the video content of the education system is ensured; through local AI computing power and hardware coding and decoding capacity, recorded broadcast content or live broadcast content is subjected to local AI safety audit processing, and recorded broadcast content or live broadcast content is rapidly released, uploaded and transcoded to a server and a user terminal through local AI and cloud synthesis subtitles, so that the use cost of a user can be effectively reduced, and the work efficiency is improved.
It should be noted that, the above embodiment mainly addresses recording and broadcasting requirements or live broadcasting requirements of an educational market, but the AI local auditing technology may also be used in the fields of factory video monitoring, security monitoring, and the like.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a video processing program is stored on the computer-readable storage medium, and when the video processing program is executed by a processor, the video processing program implements the steps of the embodiments of the video processing method described above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A video processing method, characterized in that the video processing method comprises the steps of:
acquiring a video source to be audited in real time, and intercepting a video segment in the video source to be audited to obtain a local video;
performing video transcoding and audio-video separation on the local video to obtain a video to be processed and an audio to be processed;
auditing the to-be-processed video and the to-be-processed audio through artificial intelligence according to a preset auditing standard to obtain an auditing result;
if the verification result is that verification is passed, further processing the audio to be processed to obtain an audio processing result, and verifying whether the audio processing result is matched with the local video according to a preset verification algorithm;
and if the audio processing result is matched with the local video, synthesizing the audio processing result and the local video to obtain a target video, and transcoding and distributing the target video to a cloud server according to the received playing requirement.
2. The video processing method according to claim 1, wherein the video source to be audited includes picture frames and text information, and the step of obtaining the video to be audited and the audio to be audited includes, after:
performing frame truncation on the video to be processed to acquire a plurality of picture frames of the video to be processed;
and performing text conversion on the audio to be processed to acquire text information of the audio to be processed.
3. The video processing method according to claim 2, wherein the preset audit standard includes a preset image audit standard and a preset text audit standard, the audit result includes an image audit result and a text audit result, and the step of auditing the video to be processed and the audio to be processed according to the preset audit standard through artificial intelligence to obtain the audit result includes:
performing image auditing on the picture frame according to a preset image auditing standard through artificial intelligence to obtain an image auditing result;
and performing word auditing on the text information according to a preset character auditing standard through artificial intelligence to obtain a text auditing result.
4. The video processing method of claim 3, wherein the step of performing an image review on the picture frame comprises:
acquiring a plurality of image characteristics contained in the picture frame;
classifying the plurality of image features;
and performing image auditing on the classified image characteristics through a corresponding preset recognition algorithm.
5. The video processing method of claim 3, wherein the step of obtaining the review result is followed by:
if the image audit result and the text audit result both pass, the audit result is that the audit is passed;
and if the image audit result and/or the text audit result are/is not passed, reporting the non-passed audit result to a cloud server.
6. The video processing method according to any of claims 1-5, wherein the step of further processing the audio to be processed to obtain an audio processing result comprises:
uploading the audio to be processed to a cloud server so that the cloud server completes voice calibration of the audio to be processed to obtain an audio processing result;
and receiving the audio processing result returned by the cloud server.
7. The video processing method according to claim 6, wherein the step of verifying whether the audio processing result matches the local video according to a predetermined verification algorithm comprises:
verifying whether the audio processing result is aligned with an audio time axis in the local video according to a preset verification algorithm through the artificial intelligence;
and if the audio processing result is aligned with the audio time axis in the local video, determining that the audio processing result is matched with the local video.
8. The video processing method according to claim 7, wherein said step of verifying by said artificial intelligence whether said audio processing result is aligned with an audio time axis in said local video according to a preset verification algorithm comprises:
and if the audio processing result is not aligned with the audio time axis in the local video, adjusting the audio processing result to enable the audio processing result to be matched with the local video.
9. A recording and playing cassette, comprising: memory, processor and video processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the video processing method according to any of claims 1 to 8.
10. A computer-readable storage medium, having stored thereon a video processing program which, when executed by a processor, implements the steps of the video processing method according to any one of claims 1 to 8.
CN202111189977.9A 2021-10-12 2021-10-12 Video processing method, recording and playing box and computer readable storage medium Pending CN113938707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111189977.9A CN113938707A (en) 2021-10-12 2021-10-12 Video processing method, recording and playing box and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111189977.9A CN113938707A (en) 2021-10-12 2021-10-12 Video processing method, recording and playing box and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113938707A true CN113938707A (en) 2022-01-14

Family

ID=79278570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111189977.9A Pending CN113938707A (en) 2021-10-12 2021-10-12 Video processing method, recording and playing box and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113938707A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979787A (en) * 2022-05-17 2022-08-30 北京量子之歌科技有限公司 Live broadcast playback management method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017080168A1 (en) * 2015-11-13 2017-05-18 乐视控股(北京)有限公司 Video reviewing method and system
CN108124191A (en) * 2017-12-22 2018-06-05 北京百度网讯科技有限公司 A kind of video reviewing method, device and server
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN111835739A (en) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 Video playing method and device and computer readable storage medium
CN111866605A (en) * 2020-07-09 2020-10-30 北京齐尔布莱特科技有限公司 Video auditing method and server
CN112749608A (en) * 2020-06-08 2021-05-04 腾讯科技(深圳)有限公司 Video auditing method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017080168A1 (en) * 2015-11-13 2017-05-18 乐视控股(北京)有限公司 Video reviewing method and system
CN108124191A (en) * 2017-12-22 2018-06-05 北京百度网讯科技有限公司 A kind of video reviewing method, device and server
CN110085213A (en) * 2019-04-30 2019-08-02 广州虎牙信息科技有限公司 Abnormality monitoring method, device, equipment and the storage medium of audio
CN112749608A (en) * 2020-06-08 2021-05-04 腾讯科技(深圳)有限公司 Video auditing method and device, computer equipment and storage medium
CN111835739A (en) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 Video playing method and device and computer readable storage medium
CN111866605A (en) * 2020-07-09 2020-10-30 北京齐尔布莱特科技有限公司 Video auditing method and server

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979787A (en) * 2022-05-17 2022-08-30 北京量子之歌科技有限公司 Live broadcast playback management method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US20210232825A1 (en) Video classification method, model training method, device, and storage medium
CN108683877B (en) Spark-based distributed massive video analysis system
CN112215171B (en) Target detection method, device, equipment and computer readable storage medium
CN110555334B (en) Face feature determination method and device, storage medium and electronic equipment
KR20140045897A (en) Device and method for media stream recognition based on visual image matching
CN114679607B (en) Video frame rate control method and device, electronic equipment and storage medium
CN112969093B (en) Interactive service processing method, device, equipment and storage medium
WO2023045635A1 (en) Multimedia file subtitle processing method and apparatus, electronic device, computer-readable storage medium, and computer program product
US20180343459A1 (en) Image compression method and image decompression method
CN110418148B (en) Video generation method, video generation device and readable storage medium
CN113938707A (en) Video processing method, recording and playing box and computer readable storage medium
CN114710637A (en) Low-delay processing method, device, equipment and medium for WEB terminal monitoring video stream
CN112785669B (en) Virtual image synthesis method, device, equipment and storage medium
KR101595882B1 (en) Apparatus and method for analyzing video and image
JPWO2018037665A1 (en) INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING SYSTEM, CONTROL METHOD, AND PROGRAM
CN117827771A (en) Video image data retrieval method, camera video data processing method and device
CN111274449B (en) Video playing method, device, electronic equipment and storage medium
US10936878B2 (en) Method and device for determining inter-cut time range in media item
WO2015093687A1 (en) Data processing system
CN116503596A (en) Picture segmentation method, device, medium and electronic equipment
CN116824480A (en) Monitoring video analysis method and system based on deep stream
US20220100788A1 (en) Method of presenting multimedia content with third party control
CN112261321B (en) Subtitle processing method and device and electronic equipment
CN114666622A (en) Special effect video determination method and device, electronic equipment and storage medium
CN109886234B (en) Target detection method, device, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination