CN114564628A - Efficient knowledge base deep retrieval method based on enterprise training - Google Patents
Efficient knowledge base deep retrieval method based on enterprise training Download PDFInfo
- Publication number
- CN114564628A CN114564628A CN202210226601.9A CN202210226601A CN114564628A CN 114564628 A CN114564628 A CN 114564628A CN 202210226601 A CN202210226601 A CN 202210226601A CN 114564628 A CN114564628 A CN 114564628A
- Authority
- CN
- China
- Prior art keywords
- ppt
- knowledge base
- content
- micro
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000012549 training Methods 0.000 title claims abstract description 20
- 238000004458 analytical method Methods 0.000 claims abstract description 11
- 238000006243 chemical reaction Methods 0.000 claims abstract description 8
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000003491 array Methods 0.000 claims 3
- 238000013461 design Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9035—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
- G06F16/90332—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9038—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention provides an efficient knowledge base deep retrieval method based on enterprise training, and relates to the field of efficient knowledge base deep retrieval of enterprise training. The efficient knowledge base deep retrieval method based on enterprise training comprises a micro-class, PPT analysis, audio and video text conversion, an elastic search technology and search entries, wherein files contained in the micro-class are uploaded to a server, the files are analyzed, contents in files with different formats are converted into text information, time stamps are marked on the text information, the PPT analysis converts the PPT files into a json array with a specified format and an order, attributes of objects in the json array comprise information such as the number of PPT pages, PPT covers and PPT contents, and instruction files based on page number operation can be generated when the micro-class is recorded or the micro-class is created through live broadcast playback. The method and the device solve the problem that detailed contents in the media file cannot be searched in the field of enterprise training, and simultaneously position the target contents at a frame level after the retrieval is finished, so that the searching and watching efficiency is improved.
Description
Technical Field
The invention relates to the field of deep retrieval of an efficient knowledge base for enterprise training, in particular to a deep retrieval method of an efficient knowledge base based on enterprise training.
Background
The knowledge base has two meanings, one is a rule set applied by expert system design and contains facts and data related to rules, all of which form the knowledge base, the knowledge base is related to a specific expert system and has no sharing problem of the knowledge base, the other is a knowledge base with consultation property, the knowledge base is shared and is not unique to one family, from the future development, the huge knowledge base will appear, and the development of hardware and software conditions is also depended, one of important problems to be considered by a next generation computer is the design of the knowledge base, and the knowledge base is used as a background knowledge base public management system mechanism design.
The traditional method cannot identify the PPT content in the video, only can retrieve the corresponding file, but cannot directly locate the position of the content related to the keyword in the file, the whole file needs to be browsed to find the desired content, and the efficiency is low.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides an efficient knowledge base deep retrieval method based on enterprise training, and solves the problems that the traditional method cannot identify PPT content in a video, only can retrieve a corresponding file, but cannot directly locate the position of content related to a keyword in the file, and the whole file needs to be browsed to find the desired content, so that the efficiency is low.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme: an efficient deep retrieval method of a knowledge base based on enterprise training comprises a micro-class, PPT (Power Point) analysis, audio and video character conversion, an elastic search technology and a search entry, wherein files contained in the micro-class are uploaded to a server, and analyzes the file, converts the contents in the files with different formats into text information, marks a time stamp on the text information, the PPT analysis converts the PPT file into a json array with a specified format and order, the attributes of objects in the json array comprise information such as the number of PPT pages, PPT covers, PPT contents and the like, an instruction file based on PPT page number operation is generated when a micro-class is recorded or is created through live broadcast playback, a corresponding operation time point is recorded in the instruction file, the instruction file is analyzed, finding out the content information of the PPT according to the PPT page number recorded in the instruction file, packaging the information of the PPT content and the like obtained by matching the time point and the PPT content into an object array, and storing the object array into an embedded field in an ElasticSearch index.
Preferably, the audio/video text-to-text conversion method converts the video into the audio and identifies the subtitles by means of the Aliskiu media processing capability, and the subtitles are converted into a json array in a specified format and in an ordered manner, wherein the attributes of the objects in the json array comprise information such as subtitle time and subtitle content.
Preferably, the elastic search technology constructs an index from the text information, the index format comprises basic information of search, and an embedded field technology is utilized, wherein the text information is used for storing PPT (context-sensitive point) analysis text information, and the text information is used for storing voice conversion subtitle text information.
Preferably, the search entry carries out word segmentation search by using an elastic search Chinese word segmentation plug-in IK, matched micro-class content is inquired and displayed in the index, and field weight values such as time, ppt content, subtitle content and the like are dynamically configured by matching with a configuration file, so that dynamic sequencing display of search results is realized.
Preferably, in the micro-lesson result, content segments matched with the keywords are displayed, the keywords searched by the user and corresponding time points in result data are transmitted, then in a result page, the content segments are displayed according to time dimension in a sequencing mode, the content segments carry the keywords and the time points corresponding to each segment, when the user clicks the segment, a corresponding method is called to set currentTime time points of native videos of the browser, and the time points are directly jumped and played.
(III) advantageous effects
The invention provides an efficient knowledge base deep retrieval method based on enterprise training. The method has the following beneficial effects:
1. the method and the device solve the problem that detailed contents in the media file cannot be searched in the field of enterprise training, and meanwhile, after the retrieval is completed, the target contents are positioned at the frame level, so that the searching and watching efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of the structure of a deep search lane according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The first embodiment is as follows:
as shown in fig. 1, an embodiment of the present invention provides an efficient deep knowledge base retrieval method based on enterprise training, which includes a micro-class, a PPT analysis, an audio/video text conversion, an elastic search technology, and a search term, wherein files included in the micro-class are uploaded to a server, the files are analyzed, contents in files with different formats are converted into text information, a timestamp is marked on the text information, the PPT analysis converts the PPT file into a json array with a specified format and order, attributes of objects in the json array include information such as the number of PPT pages, a PPT cover, and the content of the PPT, an instruction file based on an operation is generated when the micro-class is recorded or the micro-class is created through live playback, a corresponding operation time point is recorded in the instruction file, the instruction file is analyzed, content information of the PPT is found out according to the number of the PPT pages recorded in the instruction file, the time point, the matched PPT content and other information are packaged into an object array and stored in an elastic search field in an elastic search index, the method comprises the steps that the audio and video are converted into the audio through the aid of the Aliskian media processing capacity, the subtitles are converted into a json array in a specified format and in an ordered mode, attributes of objects in the json array comprise information such as subtitle time and subtitle content, an index is built by means of an ElasticSearch technology according to the text information, the index format comprises basic information of searching, an embedded field technology is utilized, text information used for storing PPT analysis and subtitle text information used for storing voice conversion are used, the problem that detailed contents in media files cannot be searched in the field of enterprise training is solved, meanwhile, after the retrieval is completed, frame-level positioning is conducted on target contents, and searching and watching efficiency is improved.
Example two:
as shown in fig. 1, an embodiment of the present invention provides an efficient deep search method for a knowledge base based on enterprise training, where a search entry performs a segmentation search using an elastic search chinese segmentation plugin IK, searches and displays matched micro-class content in an index, and matches with a configuration file to dynamically configure field weights such as time, ppt content, and subtitle content, so as to implement dynamic ordering display of search results, where a content segment matched with a keyword is displayed in a micro-class result, the keyword searched by a user and a corresponding time point in result data are transmitted, and then on a result page, the content segment is displayed in a time-dimensional order, the content segment carries the keyword and the time point corresponding to each segment, when the user clicks the segment, a corresponding method is invoked to set a currentTime point of a native video of a browser, the time point is directly skipped and played, so as to increase the implementation effect of the method of the present invention, increasing the efficiency of searching and viewing.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (5)
1. An efficient knowledge base deep retrieval method based on enterprise training comprises micro lessons, PPT (Power Point) analysis, audio and video character conversion, an elastic search technology and search entries, and is characterized in that: the method comprises the steps that files contained in the micro lessons are uploaded to a server, the files are analyzed, contents in files with different formats are converted into character information, time stamps are marked on the character information, the PPT analysis converts the PPT files into ordered json arrays with specified formats, attributes of objects in the json arrays comprise information such as PPT page number, PPT cover page and PPT content, an instruction file based on PPT page number operation is generated when the micro lessons are recorded or micro lessons are created through live broadcast playback, corresponding operation time points are recorded in the instruction file, the instruction file is analyzed, the content information of the PPT is found out according to the PPT page number recorded in the instruction file, the time points, the matched PPT content and other information are packaged into object arrays and stored in an ElasticSearch index, and fields are embedded in the ElasticSearch index.
2. The efficient knowledge base deep retrieval method based on enterprise training as claimed in claim 1, wherein: the audio and video converted text converts the video into the audio and identifies the subtitles by means of the Ali cloud media processing capacity, the subtitles are converted into a json array with a specified format and in an ordered mode, and the attributes of objects in the json array comprise information such as subtitle time and subtitle content.
3. The efficient knowledge base deep retrieval method based on enterprise training as claimed in claim 1, wherein: the ElasticSearch technology constructs an index from character information, the index format comprises basic information of search, and the embedded field technology is utilized, wherein the text information is used for storing PPT analysis, and the subtitle text information is used for storing voice conversion.
4. The efficient knowledge base deep retrieval method based on enterprise training as claimed in claim 1, wherein: the search entry carries out word segmentation search by using an elastic search Chinese word segmentation plug-in IK, matched micro-course content is inquired and displayed in the index, and field weight values such as time, ppt content, subtitle content and the like are dynamically configured by matching with a configuration file, so that dynamic sequencing display of search results is realized.
5. The efficient knowledge base deep retrieval method based on enterprise training as claimed in claim 1, wherein: in the micro-course result, content segments matched with the keywords are displayed, the keywords searched by the user and corresponding time points in result data are transmitted, then in a result page, the content segments are displayed according to time dimension in an ordering mode, the content segments carry the keywords and the time points corresponding to all the segments, when the user clicks the segments, a corresponding method is called to set currentTime time points of native video of the browser, and the time points are directly jumped and played.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210226601.9A CN114564628A (en) | 2022-03-09 | 2022-03-09 | Efficient knowledge base deep retrieval method based on enterprise training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210226601.9A CN114564628A (en) | 2022-03-09 | 2022-03-09 | Efficient knowledge base deep retrieval method based on enterprise training |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114564628A true CN114564628A (en) | 2022-05-31 |
Family
ID=81718267
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210226601.9A Pending CN114564628A (en) | 2022-03-09 | 2022-03-09 | Efficient knowledge base deep retrieval method based on enterprise training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114564628A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090162822A1 (en) * | 2007-12-21 | 2009-06-25 | M-Lectture, Llc | Internet-based mobile learning system and method therefor |
US20170293618A1 (en) * | 2016-04-07 | 2017-10-12 | Uday Gorrepati | System and method for interactive searching of transcripts and associated audio/visual/textual/other data files |
CN107896310A (en) * | 2017-12-19 | 2018-04-10 | 广州敬信药草园信息科技有限公司 | A kind of recorded broadcast of session method and device |
CN107920280A (en) * | 2017-03-23 | 2018-04-17 | 广州思涵信息科技有限公司 | The accurate matched method and system of video, teaching materials PPT and voice content |
WO2018172669A1 (en) * | 2017-03-21 | 2018-09-27 | Orange | Method and device for managing the storage of digital documents |
CN109376121A (en) * | 2018-08-10 | 2019-02-22 | 南京华讯方舟通信设备有限公司 | A kind of document indexing system and method based on ElasticSearch full-text search |
CN109634575A (en) * | 2018-12-24 | 2019-04-16 | 安徽经邦软件技术有限公司 | Intelligence generates PPT analysis report method |
CN109710844A (en) * | 2018-12-20 | 2019-05-03 | 中国银行业监督管理委员会福建监管局 | The method and apparatus for quick and precisely positioning file based on search engine |
-
2022
- 2022-03-09 CN CN202210226601.9A patent/CN114564628A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090162822A1 (en) * | 2007-12-21 | 2009-06-25 | M-Lectture, Llc | Internet-based mobile learning system and method therefor |
US20170293618A1 (en) * | 2016-04-07 | 2017-10-12 | Uday Gorrepati | System and method for interactive searching of transcripts and associated audio/visual/textual/other data files |
WO2018172669A1 (en) * | 2017-03-21 | 2018-09-27 | Orange | Method and device for managing the storage of digital documents |
CN107920280A (en) * | 2017-03-23 | 2018-04-17 | 广州思涵信息科技有限公司 | The accurate matched method and system of video, teaching materials PPT and voice content |
CN107896310A (en) * | 2017-12-19 | 2018-04-10 | 广州敬信药草园信息科技有限公司 | A kind of recorded broadcast of session method and device |
CN109376121A (en) * | 2018-08-10 | 2019-02-22 | 南京华讯方舟通信设备有限公司 | A kind of document indexing system and method based on ElasticSearch full-text search |
CN109710844A (en) * | 2018-12-20 | 2019-05-03 | 中国银行业监督管理委员会福建监管局 | The method and apparatus for quick and precisely positioning file based on search engine |
CN109634575A (en) * | 2018-12-24 | 2019-04-16 | 安徽经邦软件技术有限公司 | Intelligence generates PPT analysis report method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10902077B2 (en) | Search result aggregation method and apparatus based on artificial intelligence and search engine | |
US8788434B2 (en) | Search with joint image-audio queries | |
WO2019169872A1 (en) | Method and device for searching for content resource, and server | |
CN108846126A (en) | Generation, question and answer mode polymerization, device and the equipment of related question polymerization model | |
US9454600B2 (en) | Refining image relevance models | |
US20110191336A1 (en) | Contextual image search | |
US10762150B2 (en) | Searching method and searching apparatus based on neural network and search engine | |
CN108520046B (en) | Method and device for searching chat records | |
TW201220099A (en) | Multi-modal approach to search query input | |
CN108959586A (en) | Text vocabulary is identified in response to visual query | |
US20150127491A1 (en) | Determining search relevance from user feedback | |
US20180018348A1 (en) | Method And Apparatus For Searching Information | |
CN106446235B (en) | Video searching method and device | |
US20180211287A1 (en) | Digital content generation based on user feedback | |
CN112395420A (en) | Video content retrieval method and device, computer equipment and storage medium | |
CN108491543A (en) | Image search method, image storage method and image indexing system | |
CN110990597A (en) | Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof | |
Truong et al. | Video search based on semantic extraction and locally regional object proposal | |
CN113934869A (en) | Database construction method, multimedia file retrieval method and device | |
CN110209692A (en) | Providing method, data processing method, device and the equipment of data label | |
CN111881900A (en) | Corpus generation, translation model training and translation method, apparatus, device and medium | |
CN114564628A (en) | Efficient knowledge base deep retrieval method based on enterprise training | |
CN112925939A (en) | Picture searching method, description information generating method, device and storage medium | |
Aletras et al. | Computing similarity between cultural heritage items using multimodal features | |
CN111241313A (en) | Retrieval method and device supporting image input |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |