CN112069950B

CN112069950B - Method, system, electronic device and medium for extracting hotwords

Info

Publication number: CN112069950B
Application number: CN202010865409.5A
Authority: CN
Inventors: 郑翔; 宗博文; 徐文铭
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2023-04-07
Anticipated expiration: 2040-08-25
Also published as: CN112069950A

Abstract

The disclosure discloses a method, a device, an electronic device and a storage medium for extracting hotwords, wherein the method comprises the following steps: determining a target identifier in a target video frame, and acquiring a target page corresponding to the target identifier; analyzing and processing the target page to obtain target content corresponding to the target page; and determining at least one hot word vocabulary of the target video to which the target video frame belongs based on the target content. According to the technical scheme of the embodiment of the disclosure, the hot words of the video to which the target video frame belongs are quickly and accurately determined by crawling the target page corresponding to the target identifier and based on the page content in the target page, so that the hot words corresponding to the voice information are obtained when the voice is converted into the characters, and the technical effect of improving the accuracy rate of the voice converted into the characters is achieved.

Description

Method, system, electronic device and medium for extracting hotwords

Technical Field

The disclosed embodiments relate to the field of computer technologies, and in particular, to a method, a system, an electronic device, and a medium for extracting hotwords.

Background

With the development of internet communication technology, more and more users tend to communicate or exchange on line.

When the online communication is used, a core idea that a video to which the video belongs cannot be known according to content displayed in a video frame may exist, so that the content of the video cannot be well known, and technical problems of low interaction efficiency and poor user experience may exist.

Disclosure of Invention

The present disclosure provides a method, a system, an electronic device, and a medium for extracting hotwords, so as to achieve a technical effect of improving the efficiency of determining the hotword vocabulary of a video to which a target video frame belongs by processing each target video frame.

In a first aspect, an embodiment of the present disclosure provides a method for extracting hotwords, where the method includes:

determining a target identifier in a target video frame, and acquiring a target page corresponding to the target identifier;

analyzing the target page to obtain target content in the target page;

and determining at least one hot word vocabulary of the target video to which the target video frame belongs based on the target content.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for extracting hotwords, where the apparatus includes:

the target page acquisition module is used for determining a target identifier in a target video frame and acquiring a target page corresponding to the target identifier;

the target content determining module is used for analyzing and processing the target page to obtain target content in the target page;

and the hot word and vocabulary determining module is used for determining at least one hot word and vocabulary of the target video to which the target video frame belongs based on the target content.

In a third aspect, an embodiment of the present disclosure further provides a system for extracting hotwords, where the system includes:

the image-text recognition subsystem determines a target identifier in a target video frame and sends the target identifier to the crawler subsystem;

the crawler subsystem receives the target identification, acquires a target page corresponding to the target identification, and sends the target page to a page analysis subsystem;

the page analysis subsystem receives the target page, analyzes and processes the target page to obtain target content in the target page, and sends the target content to the hot word extraction subsystem;

and the hot word extraction subsystem receives the target content and determines at least one hot word of the target video to which the target video frame belongs.

In a fourth aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method of extracting hotwords as in any of the embodiments of the disclosure.

In a fifth aspect, the embodiments of the present disclosure also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used for executing the method for extracting hotwords according to any one of the embodiments of the present disclosure.

According to the technical scheme of the embodiment of the disclosure, the hot word vocabulary of the target video to which the target video frame belongs can be determined by processing each target video frame of the target video, so that in the process of converting the voice into the text, the text corresponding to the voice information is determined based on the determined hot word vocabulary, the accuracy of converting the voice into the text is improved, and the technical effect of improving the interaction efficiency of a user is improved when the interaction is carried out based on the converted text.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flowchart illustrating a method for extracting hotwords according to a first embodiment of the disclosure;

FIG. 2 is a flowchart illustrating a method for extracting hotwords according to a second embodiment of the disclosure;

fig. 3 is a schematic structural diagram of a system for extracting hotwords according to a third embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more complete and thorough understanding of the present disclosure. It should be understood that the drawings and the embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein is intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units. It is noted that references to "a" or "an" in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will appreciate that references to "one or more" are intended to be exemplary and not limiting unless the context clearly indicates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Example one

Fig. 1 is a schematic flow chart of a method for extracting hot words according to an embodiment of the present disclosure, where the embodiment of the present disclosure is suitable for determining a hot word vocabulary of a video to which audio and video information belongs based on the audio and video information, so as to determine a hot word vocabulary corresponding to voice information when converting the audio and video information into corresponding text information, so as to improve the accuracy of voice conversion. The method provided by the embodiment can be executed by the server side, or executed by cooperation of the client side and the server side.

As shown in fig. 1, the method of the present embodiment includes:

s110, determining a target identification in the target video frame, and acquiring a target page corresponding to the target identification.

One video is composed of a plurality of video frames, and the video frame currently being processed can be used as a target video frame. I.e. the video frame including the target identification may be taken as the target video frame. Each target video frame comprises text content, and the URL address in the text content is used as a target identifier in the target video frame. It should be noted that the URL address may be located in an address field, that is, an address in a search field, or an URL address in a body in a video frame. That is, the target identification is all URL addresses in the target video frame. The target page is a page determined based on the target identification.

It should be noted that, in this embodiment, the obtaining of the target page corresponding to the target identifier may be to crawl pages corresponding to the target identifier based on respective URLs, and this configuration has an advantage that the number of crawled pages and the page content are much greater than those obtained directly based on the image-text recognition to the target video frame.

Specifically, after the target video frame is determined, each target identifier in the target video frame is obtained, a corresponding target page is crawled based on each target identifier, hot word vocabularies of the video to which the target video frame belongs are determined based on page content in the target page, and when the voice is converted into characters, if pronunciation corresponding to the hot word vocabularies exists in voice information, the corresponding hot word vocabularies can be called, so that the technical effect of determining the accuracy of the voice converted characters is improved.

And S120, analyzing the target page to obtain the target content in the target page.

Each page may include text content, and may also include content information such as pictures. After each target page is obtained, the text content in each target page can be extracted, and the extracted text content is used as the target content corresponding to the target page.

It should be noted that there may be a plurality of target pages determined based on the target identifier, and each target page may be processed in this way to determine the hot word vocabulary of the video to which the target video frame belongs.

S130, determining at least one hot word vocabulary of the target video to which the target video frame belongs based on the target content.

The hot word vocabulary can be understood as the problems and affairs which are generally concerned by the user in a certain period or node, namely, the hot topics in a certain period are reflected, and the problems, the affairs and the hot topics can be represented by the corresponding hot word vocabulary. In this embodiment, if the application scenario is a video conference, the main content of the video conference may be a certain research and development project, and the hot word vocabulary may be a vocabulary generated for a research and development project in the video conference process. That is, in this embodiment, the term vocabulary refers to a vocabulary corresponding to a hot topic commonly discussed or concerned by an interactive user from a start time to a current time in a video conference or a live broadcast. In order to improve the accuracy of determining the hot words and phrases and improve the conversion efficiency and the accuracy rate in the process of converting the voice into the text, the hot words and phrases corresponding to the video content can be dynamically generated and updated in the process of a video conference.

In this embodiment, the processing of the target content to determine the hotword vocabulary corresponding to the target content may be: and performing word segmentation processing on the target content to obtain at least one word segmentation vocabulary. Word vectors for respective participle words are determined based on the word vector dictionary, and an average word vector is determined based on the word vectors for the respective words. And determining a target word segmentation vocabulary in the target content by determining a distance value between each word vector and the average word vector, and taking the determined target word segmentation vocabulary as a hot word vocabulary.

That is to say, by processing the target content, each hot word vocabulary of the target video to which the target video frame belongs can be determined, so that when the audio is converted into characters, conversion can be performed based on the hot word vocabularies, and the technical effects of accuracy and convenience of converting the voice into the characters are improved.

According to the technical scheme of the embodiment of the disclosure, the hot word vocabulary of the target video to which the target video frame belongs can be determined by processing each target video frame, so that in the process of converting voice into text, the text corresponding to the voice information is determined based on the determined hot word vocabulary, the accuracy of converting voice into text is improved, and the technical effect of improving the interaction efficiency of the user is improved when the conversion-based text is interacted.

On the basis of the above technical solution, the method further comprises: and generating a target video based on a real-time interactive interface so as to determine the target video frame from the target video.

The technical scheme of the embodiment of the disclosure can be applied to real-time interactive scenes, such as video conferences, live broadcasts and the like. The real-time interactive interface is any interactive interface in a real-time interactive application scene. The real-time interactive application scenario may be implemented through the internet and computer means, such as an interactive application implemented through a native program or a web program. The target video is generated based on a real-time interactive interface, and can be a video corresponding to a video conference or a live video. The target video is composed of a plurality of video frames, and the target video frame can be determined from the plurality of video frames. And taking the video frame including the target identifier in the target video as the target video frame. Therefore, before determining the hot word vocabulary corresponding to the target video, the target video frame in the target video may be determined to determine the hot word vocabulary according to the target video frame.

On the basis of the above technical solution, before determining the target identifier in the target video frame, the method further includes: when the trigger target control is detected, video frames in the target video are sequentially acquired, and the target video frames are determined from the acquired video frames.

Optionally, when the trigger sharing control is detected, acquiring a video frame to be processed in the target video; and determining the target video frame according to the similarity value between the video frame to be processed and at least one historical target video frame in the target video frame.

If the application scene is a real-time interactive scene, the target control can be a shared screen or a control corresponding to a shared document; if the application scene is the voice-to-text processing based on the screen recording video, the target control can be a corresponding playing control when the video is played. The video frame to be processed may be a video frame including a target identifier in a preset area. The historical target video frames are video frames that have been determined to include target identification. After the video frame to be processed is determined, the target video frame may be determined according to the similarity value between the video frame to be processed and each historical target video frame. The target video frame is a partial video frame in the target video, and a video frame processed by the target video frame can be used as the target video frame.

It should be noted that, in any application scenario, there may be a situation where the content presented by adjacent video frames is repeated. In order to reduce the problem of resource waste caused by repeated processing of video frames with the same content, the target video frame can be determined before the target video frame is processed.

In this embodiment, determining the target video frame may be: when the target control triggered by the user is detected, if the current video frame is determined to be the first video frame of the target video, the current video frame can be used as the target video frame; and sequentially acquiring the current video frame and acquiring a previous target video frame determined before the current video frame. Determining whether the current video frame is the target video frame by determining a similarity value between the current video frame and a previous target video frame. Optionally, if the similarity value is lower than a preset similarity threshold, the current video frame is a target video frame; and if the similarity value is higher than the preset similarity threshold value, the current video frame is not the target video frame.

Specifically, when a control corresponding to triggering screen sharing and/or document sharing or a play control for playing a screen recording video is detected, each target video frame in the target video may be sequentially determined, and a target identifier in the target video frame may be obtained, so as to crawl each target page associated with the target identifier based on the target identifier.

On the basis of the technical scheme, the method comprises the following steps: and sending the at least one hot word vocabulary to a hot word cache module so as to call the corresponding hot word vocabulary from the hot word cache module according to the voice information when detecting that the voice-to-character operation is triggered. The hot word cache module may be a module for storing hot words in the client or the server, that is, a hot word vocabulary determined in real time in the video conference process is stored.

The method can be understood that after the hot word vocabulary corresponding to the target video is determined, the hot word vocabulary can be stored in the corresponding hot word cache module, so that when a control triggering voice to text conversion is detected, the hot word vocabulary corresponding to the voice information can be acquired from the target position, and the technical effects of voice to text conversion accuracy and convenience are improved.

Example two

Fig. 2 is a flowchart illustrating a method for extracting hotwords according to a second embodiment of the disclosure. On the basis of the foregoing embodiment, optimization may be performed on "determining a target identifier in a target video frame and acquiring a target page corresponding to the target identifier". The same or corresponding terms as those in the above embodiments are not described herein again.

As shown in fig. 2, the method includes:

s210, at least one webpage address in the target video frame is obtained, and the target webpage address is determined from the at least one webpage address.

It should be noted that each video frame may or may not include a target identifier. Accordingly, the target identifier in the target video frame may be one or more. Thus, before crawling the corresponding web page based on the target identification, it may be determined whether the corresponding web page has been crawled based on the target identification. In this embodiment, the target identifier may be a web page address in the target video frame, and after determining the web page address in the target video frame, it may be determined whether each web page address is a target web page address.

In consideration of the problem of resource saving, before crawling the corresponding web page based on each web page address, whether the corresponding web page has been crawled based on the web page address may be determined first, and the web page address where the corresponding web page has not been obtained may be used as the target web page address.

That is, a plurality of target identifiers may be included in the target video frame, and each target identifier may be a web page address. Because the corresponding webpage page is crawled based on some identifiers, the identifier corresponding to the webpage page which is not obtained can be used as the target identifier. Accordingly, the target web page address is an address corresponding to the target identification.

Specifically, each identifier, namely a web page address, in the target video frame is obtained. And determining whether the historical page acquisition records have the webpage corresponding to each identifier, and taking the identifier which does not correspond to the webpage as a target identifier, namely the target webpage.

On the basis of the above technical solution, the acquiring at least one web page address in the target video frame includes: and identifying the at least one webpage address from a preset area according to the preset area.

It can be understood that the address bar area and the text area in each video frame are predetermined, and the web page address in the address bar area and the web page address in the text area can be extracted based on the image-text recognition technology. By adopting the method, the efficiency of extracting the webpage address can be improved.

In this embodiment, determining the target web address from at least one web address includes: determining an un-crawled web page address which is not crawled from the at least one web page address based on the crawled web page address set; and generating the target webpage address based on the non-crawled webpage address.

It should be noted that, after the target identifier in the target video frame is determined and the corresponding target page is obtained based on the target identifier, the web address corresponding to the target identifier may be stored in the crawled web address set. When the identifiers included in each target video frame are detected, the crawled webpage addresses can be determined from the webpage addresses based on the crawled webpage addresses stored in the crawled webpage address set, and the rest webpage addresses are used as the target webpage addresses, namely, the webpage addresses which are not crawled are used as the target webpage addresses, so that the problems of resource consumption and waste caused by repeated crawling of the webpage pages corresponding to the identifiers and repeated processing of the content in the webpage pages are solved.

S220, crawling the webpage related to the target webpage address, and generating the target webpage based on the webpage.

And the target page address is a target identifier in the target video frame. The web page address corresponding to the target identifier determined from the target video frame may be used as the target web page address. Crawling the web page associated with the target page address may be obtained through a web crawler. The web page associated with the target web page address is obtained based on the crawler, namely the web page is searched through the links of the web page, namely the content of the web page is read from a certain web page of the website, other link addresses in the web page are searched, then the next web page is searched through the link addresses, and the process is circulated until all the web pages of the whole website are completely grabbed. Each page crawled based on the target page address may be treated as a web page associated with the target web address, i.e., the target page.

The method has the advantages that the webpage pages with the number as large as possible can be obtained, and further when the webpage pages are processed based on the content in the webpage pages, the content confirmation accuracy can be improved, so that the obtained hot word words and the target video have high matching degree.

The method for determining the target page has the advantages that as many target pages as possible corresponding to the target identification can be obtained, so that the content in the target pages can be processed, the accuracy of determining the hot words and words is improved, and the technical effect of converting the voice into the characters is achieved.

And S230, obtaining target content corresponding to the target page through analyzing and processing the target page.

In this embodiment, the analyzing the target page to obtain the target content corresponding to the target page may be: analyzing the target page to obtain the content to be processed in the target page; eliminating preset characters in the content to be processed, taking the rest content as effective content corresponding to the content to be processed, and generating the target content based on the effective content

Wherein, a plurality of target pages can be crawled based on the target identification, namely the target webpage address. For each target page, the target page can be analyzed and processed to obtain all contents in the target page, and all the contents are used as contents to be processed. In order to further improve the efficiency of determining the hotword, invalid information in the content to be processed, such as content of a watermark, an advertisement, a punctuation, and the like, may be removed, and the content to be processed after the invalid information is removed may be used as target content.

S240, based on the target content, at least one hot word vocabulary of the target video to which the target video frame belongs is determined.

In this embodiment, optionally, the determining, based on the target content, at least one hotword vocabulary of the target video to which the target video frame belongs includes: and extracting at least one hot word vocabulary in the target content in a natural language processing mode.

Specifically, for the target content in each target page, word segmentation processing may be performed on the target content to obtain at least one word segmentation word, word vectors corresponding to all the word segmentation words are determined, and an average word vector of all the word vectors is determined. And respectively calculating the distance value between each participle vector and the average word vector, and taking the corresponding participle vocabulary with the minimum distance value as the vocabulary of the hot words to be processed. After the to-be-processed hot word vocabulary corresponding to each target page is obtained, the hot word average word vectors corresponding to all the to-be-processed hot word vocabularies can be continuously determined, the distance value between the word vector of each to-be-processed hot word vocabulary and the hot word average word vector is respectively calculated, and at least one to-be-processed hot word vocabulary with a smaller distance value is used as the hot word vocabulary corresponding to the target video.

According to the technical scheme of the embodiment of the disclosure, after the target identifier is determined, the webpage address associated with the target identifier is crawled, the target page associated with the target identifier can be obtained as many as possible, and then the hot word vocabulary of the target video to which the target video frame belongs is determined based on the page content in the target page, so that the accuracy of determining the hot word is improved, and further, when the voice information is converted into the word, the hot word vocabulary corresponding to the voice information can be determined, and the technical effects of accuracy and convenience of converting the voice into the word are provided.

EXAMPLE III

As an alternative to the above-described embodiment. The method for extracting hotwords according to the embodiment can be implemented by a system for extracting hotwords. The hot word extraction system comprises a graph-text identification subsystem, a crawler subsystem, a page analysis subsystem and a hot word extraction subsystem. Wherein,

when a trigger operation for triggering a shared screen or a shared document in the video conference system is detected, the image-text recognition system can acquire a video stream in the shared screen and process adjacent video frames to determine a target video frame from the adjacent video frames. And extracting a target identification, namely a target webpage address, in the target video frame, and sending the target webpage address to the crawler subsystem. After receiving the target identifier, the crawler subsystem may first determine whether a target webpage address corresponding to the target identifier exists in the crawled webpage address set, and if so, no longer crawl the webpage corresponding to the target webpage address; if not, crawling a corresponding target page based on the target webpage address, and storing the target webpage address into a crawled webpage address set. After the crawler subsystem acquires the target page corresponding to the target webpage address, the target page can be sent to the hot word extraction subsystem. The hot word extraction subsystem receives the target page, analyzes the page content of the target page, eliminates useless information such as tags in the page and the like, extracts hot words in the page content based on the NLP technology, and sends the hot words to the voice-to-text subsystem. After receiving the hot words sent by the hot word extraction subsystem, the voice character conversion system can add each hot word to a hot word library corresponding to the current video conference, so that the hot word stored in the hot word library can be called conveniently when the voice is converted into characters, and the technical effect of improving the accuracy of the voice converted into characters is achieved.

In this embodiment, the system for extracting hotwords includes: a text recognition subsystem 310, a crawler subsystem 320, a page parsing subsystem 330, and a hotword extraction subsystem 340.

The image-text recognition subsystem 310 determines a target identifier in a target video frame and sends the target identifier to the crawler subsystem 320; the crawler subsystem 320 receives the target identifier, acquires a target page corresponding to the target identifier, and sends the target page to the page resolution subsystem 330; the page parsing subsystem 330 receives the target page, parses the target page to obtain target content corresponding to the target page, and sends the target content to the hotword extraction subsystem 340; the hotword extraction subsystem 340 receives the target content and determines at least one hotword vocabulary of a target video to which the target video frame belongs.

On the basis of the above technical solutions, before the image-text recognition subsystem determines the target identifier in the target video frame, the image-text recognition subsystem is further configured to: when the trigger target control is detected, video frames in the target video are sequentially acquired, and the target video frames are determined from the acquired video frames.

On the basis of the above technical solutions, the crawler subsystem is further configured to: at least one webpage address in the target video frame, and determining a target webpage address from the at least one webpage address; and crawling a webpage associated with the target webpage address, and generating the target page based on the webpage.

On the basis of the above technical solutions, the crawler subsystem is further configured to: and identifying the at least one webpage address from a preset area according to the preset area.

On the basis of the above technical solutions, the crawler subsystem is further configured to: determining an un-crawled web page address which is not crawled from the at least one web page address based on the crawled web page address set; and generating the target webpage address based on the non-crawled webpage address.

On the basis of the above technical solutions, the page parsing subsystem is further configured to: analyzing the target page to obtain the content to be processed in the target page; and eliminating preset characters in the content to be processed, taking the remaining content as effective content corresponding to the content to be processed, and generating the target content based on the effective content.

On the basis of the technical scheme, the hotword extraction subsystem is further used for: and extracting at least one hot word vocabulary in the target content in a natural language processing mode.

On the basis of the above technical solutions, the system further includes: and the target video generation module is used for generating a target video based on a real-time interactive interface so as to determine the target video frame from the target video.

On the basis of the above technical solutions, when the trigger target control is detected, sequentially acquiring video frames in the target video, and determining the target video frame from the acquired video frames includes: when the sharing control is detected to be triggered, acquiring a video frame to be processed in a target video; and determining the target video frame according to the similarity value between the video frame to be processed and at least one historical target video frame in the target video frame.

On the basis of the above technical solutions, the system further includes: and the hot word cache module is used for sending the at least one hot word vocabulary to the hot word cache module so as to call the corresponding hot word vocabulary from the hot word cache module according to the voice information when the operation of converting the voice into the character is detected.

The system for extracting the hotwords, provided by the embodiment of the disclosure, can execute the information processing method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.

It should be noted that, the units and modules included in the system are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are also only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the embodiments of the present disclosure.

Example four

Referring now to fig. 4, a schematic diagram of an electronic device (e.g., the terminal device or the server in fig. 4) 400 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM402, and the RAM403 are connected to each other through a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication device 409 may allow the electronic device 400 to communicate with other devices, either wirelessly or by wire, to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 409, or installed from the storage device 408, or installed from the ROM 402. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 401.

The electronic device provided by the embodiment of the disclosure and the method for extracting hotwords provided by the above embodiment belong to the same inventive concept, and technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the embodiment has the same beneficial effects as the above embodiment.

EXAMPLE five

The disclosed embodiments provide a computer storage medium on which a computer program is stored, which when executed by a processor implements the method for extracting hotwords provided by the above embodiments.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

analyzing and processing the target page to obtain target content in the target page;

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit/module does not in some cases constitute a limitation on the unit itself, for example, the hotword extraction subsystem may also be described as a "hotword extraction subsystem".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, [ example one ] there is provided a method of extracting hotwords, the method comprising:

According to one or more embodiments of the present disclosure, [ example two ] there is provided a method of extracting hotwords, further comprising:

optionally, before the determining the target identifier in the target video frame, the method further includes:

when the trigger target control is detected, video frames in the target video are sequentially acquired, and the target video frames are determined from the acquired video frames. According to one or more embodiments of the present disclosure, [ example three ] there is provided a method of extracting hotwords, further comprising:

optionally, the determining a target identifier in a target video frame and acquiring a target page corresponding to the target identifier includes:

acquiring at least one webpage address in a target video frame, and determining a target webpage address from the at least one webpage address;

and crawling a webpage associated with the target webpage address, and generating the target page based on the webpage.

According to one or more embodiments of the present disclosure, [ example four ] there is provided a method of extracting hotwords, further comprising:

optionally, the acquiring at least one webpage address in the target video frame includes:

and identifying the at least one webpage address from a preset area according to the preset area.

According to one or more embodiments of the present disclosure, [ example five ] there is provided a method of extracting hotwords, further comprising:

optionally, the determining a target web page address from the at least one web page address includes:

determining an un-crawled web page address which is not crawled from the at least one web page address based on the crawled web page address set;

and generating the target webpage address based on the un-crawled webpage address.

According to one or more embodiments of the present disclosure [ example six ] there is provided a method of extracting hotwords, further comprising:

optionally, the obtaining the target content in the target page by performing parsing processing on the target page includes:

analyzing the target page to obtain the content to be processed in the target page;

and eliminating preset characters in the content to be processed, taking the remaining content as effective content corresponding to the content to be processed, and generating the target content based on the effective content.

According to one or more embodiments of the present disclosure, [ example seven ] there is provided a method of extracting hotwords, further comprising:

optionally, the determining, based on the target content, at least one hotword vocabulary of the target video to which the target video frame belongs includes:

and extracting at least one hot word vocabulary in the target content in a natural language processing mode.

According to one or more embodiments of the present disclosure, [ example eight ] there is provided a method of extracting hotwords, further comprising:

and generating a target video based on a real-time interactive interface so as to determine the target video frame from the target video.

According to one or more embodiments of the present disclosure, [ example nine ] there is provided a method of extracting hotwords, further comprising:

optionally, when the trigger target control is detected, sequentially obtaining video frames in the target video, and determining the target video frame from the obtained video frames includes:

when the trigger sharing control is detected, acquiring a video frame to be processed in a target video;

and determining the target video frame according to the similarity value between the video frame to be processed and at least one historical target video frame in the target video frame.

According to one or more embodiments of the present disclosure, [ example ten ] there is provided a method of extracting hotwords, further comprising:

optionally, the at least one hot word vocabulary is sent to the hot word cache module, so that when the operation of converting voice into text is detected, the corresponding hot word vocabulary is called from the hot word cache module according to the voice information.

According to one or more embodiments of the present disclosure [ example eleven ] there is provided a system for extracting hotwords, the system comprising:

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other combinations of features described above or equivalents thereof without departing from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method of extracting hotwords, comprising:

analyzing the target page to obtain target content in the target page;

determining at least one hot word vocabulary of a target video to which the target video frame belongs based on the target content;

sending the at least one hot word vocabulary to a hot word cache module, so as to call a corresponding hot word vocabulary from the hot word cache module according to the voice information when a voice-to-character triggering operation is detected;

before the determining the target identification in the target video frame, further comprising:

when the trigger target control is detected, sequentially acquiring video frames in a target video, and determining the target video frame from the acquired video frames;

wherein the target control comprises at least one of: the method comprises the following steps that a screen shares a corresponding control, a document shares a corresponding control, and a playing control for playing a screen recording video;

the method for extracting the hotwords further comprises the following steps:

generating a target video based on a real-time interactive interface so as to determine a target video frame from the target video;

the target video is a video corresponding to the video conference or a live video.

2. The method of claim 1, wherein the target identifier comprises a target web address, and wherein determining the target identifier in the target video frame and retrieving the target page corresponding to the target identifier comprises:

acquiring at least one webpage address in the target video frame, and determining a target webpage address from the at least one webpage address;

3. The method of claim 2, wherein the obtaining at least one web page address in the target video frame comprises:

4. The method of claim 2, wherein determining the target webpage address from the at least one webpage address comprises:

and generating the target webpage address based on the non-crawled webpage address.

5. The method according to claim 1, wherein the obtaining the target content in the target page by performing parsing processing on the target page comprises:

6. The method of claim 1, wherein determining at least one hotword vocabulary for a target video to which the target video frame belongs based on the target content comprises:

7. The method according to claim 1, wherein when the trigger target control is detected, sequentially acquiring video frames in a target video, and determining the target video frame from the acquired video frames, comprises:

8. A system for extracting hotwords, comprising:

the hot word extraction subsystem receives the target content and determines at least one hot word of a target video to which the target video frame belongs;

the hot word cache module is used for sending the at least one hot word vocabulary to the hot word cache module so as to call the corresponding hot word vocabulary from the hot word cache module according to the voice information when the operation of converting voice into characters is detected;

before the image-text recognition subsystem determines the target identification in the target video frame, the system for extracting the hotword is further used for:

wherein the target control comprises at least one of: the method comprises the following steps that a screen shares a corresponding control, a document shares a corresponding control and a playing control for playing a screen recording video;

the system for extracting hotwords further comprises: the target video generation module is used for generating a target video based on a real-time interactive interface so as to determine a target video frame from the target video;

9. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of extracting hotwords as recited in any of claims 1-7.

10. A storage medium containing computer-executable instructions for performing the method of extracting hotwords of any one of claims 1-7 when executed by a computer processor.