CN114564628A - Efficient knowledge base deep retrieval method based on enterprise training - Google Patents

Efficient knowledge base deep retrieval method based on enterprise training Download PDF

Info

Publication number
CN114564628A
CN114564628A CN202210226601.9A CN202210226601A CN114564628A CN 114564628 A CN114564628 A CN 114564628A CN 202210226601 A CN202210226601 A CN 202210226601A CN 114564628 A CN114564628 A CN 114564628A
Authority
CN
China
Prior art keywords
ppt
knowledge base
content
micro
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210226601.9A
Other languages
Chinese (zh)
Inventor
卢小燕
崔峻
盛银江
李祥驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunxuetang Information Technology Jiangsu Co ltd
Original Assignee
Yunxuetang Information Technology Jiangsu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunxuetang Information Technology Jiangsu Co ltd filed Critical Yunxuetang Information Technology Jiangsu Co ltd
Priority to CN202210226601.9A priority Critical patent/CN114564628A/en
Publication of CN114564628A publication Critical patent/CN114564628A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides an efficient knowledge base deep retrieval method based on enterprise training, and relates to the field of efficient knowledge base deep retrieval of enterprise training. The efficient knowledge base deep retrieval method based on enterprise training comprises a micro-class, PPT analysis, audio and video text conversion, an elastic search technology and search entries, wherein files contained in the micro-class are uploaded to a server, the files are analyzed, contents in files with different formats are converted into text information, time stamps are marked on the text information, the PPT analysis converts the PPT files into a json array with a specified format and an order, attributes of objects in the json array comprise information such as the number of PPT pages, PPT covers and PPT contents, and instruction files based on page number operation can be generated when the micro-class is recorded or the micro-class is created through live broadcast playback. The method and the device solve the problem that detailed contents in the media file cannot be searched in the field of enterprise training, and simultaneously position the target contents at a frame level after the retrieval is finished, so that the searching and watching efficiency is improved.

Description

Efficient knowledge base deep retrieval method based on enterprise training
Technical Field
The invention relates to the field of deep retrieval of an efficient knowledge base for enterprise training, in particular to a deep retrieval method of an efficient knowledge base based on enterprise training.
Background
The knowledge base has two meanings, one is a rule set applied by expert system design and contains facts and data related to rules, all of which form the knowledge base, the knowledge base is related to a specific expert system and has no sharing problem of the knowledge base, the other is a knowledge base with consultation property, the knowledge base is shared and is not unique to one family, from the future development, the huge knowledge base will appear, and the development of hardware and software conditions is also depended, one of important problems to be considered by a next generation computer is the design of the knowledge base, and the knowledge base is used as a background knowledge base public management system mechanism design.
The traditional method cannot identify the PPT content in the video, only can retrieve the corresponding file, but cannot directly locate the position of the content related to the keyword in the file, the whole file needs to be browsed to find the desired content, and the efficiency is low.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides an efficient knowledge base deep retrieval method based on enterprise training, and solves the problems that the traditional method cannot identify PPT content in a video, only can retrieve a corresponding file, but cannot directly locate the position of content related to a keyword in the file, and the whole file needs to be browsed to find the desired content, so that the efficiency is low.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme: an efficient deep retrieval method of a knowledge base based on enterprise training comprises a micro-class, PPT (Power Point) analysis, audio and video character conversion, an elastic search technology and a search entry, wherein files contained in the micro-class are uploaded to a server, and analyzes the file, converts the contents in the files with different formats into text information, marks a time stamp on the text information, the PPT analysis converts the PPT file into a json array with a specified format and order, the attributes of objects in the json array comprise information such as the number of PPT pages, PPT covers, PPT contents and the like, an instruction file based on PPT page number operation is generated when a micro-class is recorded or is created through live broadcast playback, a corresponding operation time point is recorded in the instruction file, the instruction file is analyzed, finding out the content information of the PPT according to the PPT page number recorded in the instruction file, packaging the information of the PPT content and the like obtained by matching the time point and the PPT content into an object array, and storing the object array into an embedded field in an ElasticSearch index.
Preferably, the audio/video text-to-text conversion method converts the video into the audio and identifies the subtitles by means of the Aliskiu media processing capability, and the subtitles are converted into a json array in a specified format and in an ordered manner, wherein the attributes of the objects in the json array comprise information such as subtitle time and subtitle content.
Preferably, the elastic search technology constructs an index from the text information, the index format comprises basic information of search, and an embedded field technology is utilized, wherein the text information is used for storing PPT (context-sensitive point) analysis text information, and the text information is used for storing voice conversion subtitle text information.
Preferably, the search entry carries out word segmentation search by using an elastic search Chinese word segmentation plug-in IK, matched micro-class content is inquired and displayed in the index, and field weight values such as time, ppt content, subtitle content and the like are dynamically configured by matching with a configuration file, so that dynamic sequencing display of search results is realized.
Preferably, in the micro-lesson result, content segments matched with the keywords are displayed, the keywords searched by the user and corresponding time points in result data are transmitted, then in a result page, the content segments are displayed according to time dimension in a sequencing mode, the content segments carry the keywords and the time points corresponding to each segment, when the user clicks the segment, a corresponding method is called to set currentTime time points of native videos of the browser, and the time points are directly jumped and played.
(III) advantageous effects
The invention provides an efficient knowledge base deep retrieval method based on enterprise training. The method has the following beneficial effects:
1. the method and the device solve the problem that detailed contents in the media file cannot be searched in the field of enterprise training, and meanwhile, after the retrieval is completed, the target contents are positioned at the frame level, so that the searching and watching efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of the structure of a deep search lane according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The first embodiment is as follows:
as shown in fig. 1, an embodiment of the present invention provides an efficient deep knowledge base retrieval method based on enterprise training, which includes a micro-class, a PPT analysis, an audio/video text conversion, an elastic search technology, and a search term, wherein files included in the micro-class are uploaded to a server, the files are analyzed, contents in files with different formats are converted into text information, a timestamp is marked on the text information, the PPT analysis converts the PPT file into a json array with a specified format and order, attributes of objects in the json array include information such as the number of PPT pages, a PPT cover, and the content of the PPT, an instruction file based on an operation is generated when the micro-class is recorded or the micro-class is created through live playback, a corresponding operation time point is recorded in the instruction file, the instruction file is analyzed, content information of the PPT is found out according to the number of the PPT pages recorded in the instruction file, the time point, the matched PPT content and other information are packaged into an object array and stored in an elastic search field in an elastic search index, the method comprises the steps that the audio and video are converted into the audio through the aid of the Aliskian media processing capacity, the subtitles are converted into a json array in a specified format and in an ordered mode, attributes of objects in the json array comprise information such as subtitle time and subtitle content, an index is built by means of an ElasticSearch technology according to the text information, the index format comprises basic information of searching, an embedded field technology is utilized, text information used for storing PPT analysis and subtitle text information used for storing voice conversion are used, the problem that detailed contents in media files cannot be searched in the field of enterprise training is solved, meanwhile, after the retrieval is completed, frame-level positioning is conducted on target contents, and searching and watching efficiency is improved.
Example two:
as shown in fig. 1, an embodiment of the present invention provides an efficient deep search method for a knowledge base based on enterprise training, where a search entry performs a segmentation search using an elastic search chinese segmentation plugin IK, searches and displays matched micro-class content in an index, and matches with a configuration file to dynamically configure field weights such as time, ppt content, and subtitle content, so as to implement dynamic ordering display of search results, where a content segment matched with a keyword is displayed in a micro-class result, the keyword searched by a user and a corresponding time point in result data are transmitted, and then on a result page, the content segment is displayed in a time-dimensional order, the content segment carries the keyword and the time point corresponding to each segment, when the user clicks the segment, a corresponding method is invoked to set a currentTime point of a native video of a browser, the time point is directly skipped and played, so as to increase the implementation effect of the method of the present invention, increasing the efficiency of searching and viewing.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. An efficient knowledge base deep retrieval method based on enterprise training comprises micro lessons, PPT (Power Point) analysis, audio and video character conversion, an elastic search technology and search entries, and is characterized in that: the method comprises the steps that files contained in the micro lessons are uploaded to a server, the files are analyzed, contents in files with different formats are converted into character information, time stamps are marked on the character information, the PPT analysis converts the PPT files into ordered json arrays with specified formats, attributes of objects in the json arrays comprise information such as PPT page number, PPT cover page and PPT content, an instruction file based on PPT page number operation is generated when the micro lessons are recorded or micro lessons are created through live broadcast playback, corresponding operation time points are recorded in the instruction file, the instruction file is analyzed, the content information of the PPT is found out according to the PPT page number recorded in the instruction file, the time points, the matched PPT content and other information are packaged into object arrays and stored in an ElasticSearch index, and fields are embedded in the ElasticSearch index.
2. The efficient knowledge base deep retrieval method based on enterprise training as claimed in claim 1, wherein: the audio and video converted text converts the video into the audio and identifies the subtitles by means of the Ali cloud media processing capacity, the subtitles are converted into a json array with a specified format and in an ordered mode, and the attributes of objects in the json array comprise information such as subtitle time and subtitle content.
3. The efficient knowledge base deep retrieval method based on enterprise training as claimed in claim 1, wherein: the ElasticSearch technology constructs an index from character information, the index format comprises basic information of search, and the embedded field technology is utilized, wherein the text information is used for storing PPT analysis, and the subtitle text information is used for storing voice conversion.
4. The efficient knowledge base deep retrieval method based on enterprise training as claimed in claim 1, wherein: the search entry carries out word segmentation search by using an elastic search Chinese word segmentation plug-in IK, matched micro-course content is inquired and displayed in the index, and field weight values such as time, ppt content, subtitle content and the like are dynamically configured by matching with a configuration file, so that dynamic sequencing display of search results is realized.
5. The efficient knowledge base deep retrieval method based on enterprise training as claimed in claim 1, wherein: in the micro-course result, content segments matched with the keywords are displayed, the keywords searched by the user and corresponding time points in result data are transmitted, then in a result page, the content segments are displayed according to time dimension in an ordering mode, the content segments carry the keywords and the time points corresponding to all the segments, when the user clicks the segments, a corresponding method is called to set currentTime time points of native video of the browser, and the time points are directly jumped and played.
CN202210226601.9A 2022-03-09 2022-03-09 Efficient knowledge base deep retrieval method based on enterprise training Pending CN114564628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210226601.9A CN114564628A (en) 2022-03-09 2022-03-09 Efficient knowledge base deep retrieval method based on enterprise training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210226601.9A CN114564628A (en) 2022-03-09 2022-03-09 Efficient knowledge base deep retrieval method based on enterprise training

Publications (1)

Publication Number Publication Date
CN114564628A true CN114564628A (en) 2022-05-31

Family

ID=81718267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210226601.9A Pending CN114564628A (en) 2022-03-09 2022-03-09 Efficient knowledge base deep retrieval method based on enterprise training

Country Status (1)

Country Link
CN (1) CN114564628A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090162822A1 (en) * 2007-12-21 2009-06-25 M-Lectture, Llc Internet-based mobile learning system and method therefor
US20170293618A1 (en) * 2016-04-07 2017-10-12 Uday Gorrepati System and method for interactive searching of transcripts and associated audio/visual/textual/other data files
CN107896310A (en) * 2017-12-19 2018-04-10 广州敬信药草园信息科技有限公司 A kind of recorded broadcast of session method and device
CN107920280A (en) * 2017-03-23 2018-04-17 广州思涵信息科技有限公司 The accurate matched method and system of video, teaching materials PPT and voice content
WO2018172669A1 (en) * 2017-03-21 2018-09-27 Orange Method and device for managing the storage of digital documents
CN109376121A (en) * 2018-08-10 2019-02-22 南京华讯方舟通信设备有限公司 A kind of document indexing system and method based on ElasticSearch full-text search
CN109634575A (en) * 2018-12-24 2019-04-16 安徽经邦软件技术有限公司 Intelligence generates PPT analysis report method
CN109710844A (en) * 2018-12-20 2019-05-03 中国银行业监督管理委员会福建监管局 The method and apparatus for quick and precisely positioning file based on search engine

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090162822A1 (en) * 2007-12-21 2009-06-25 M-Lectture, Llc Internet-based mobile learning system and method therefor
US20170293618A1 (en) * 2016-04-07 2017-10-12 Uday Gorrepati System and method for interactive searching of transcripts and associated audio/visual/textual/other data files
WO2018172669A1 (en) * 2017-03-21 2018-09-27 Orange Method and device for managing the storage of digital documents
CN107920280A (en) * 2017-03-23 2018-04-17 广州思涵信息科技有限公司 The accurate matched method and system of video, teaching materials PPT and voice content
CN107896310A (en) * 2017-12-19 2018-04-10 广州敬信药草园信息科技有限公司 A kind of recorded broadcast of session method and device
CN109376121A (en) * 2018-08-10 2019-02-22 南京华讯方舟通信设备有限公司 A kind of document indexing system and method based on ElasticSearch full-text search
CN109710844A (en) * 2018-12-20 2019-05-03 中国银行业监督管理委员会福建监管局 The method and apparatus for quick and precisely positioning file based on search engine
CN109634575A (en) * 2018-12-24 2019-04-16 安徽经邦软件技术有限公司 Intelligence generates PPT analysis report method

Similar Documents

Publication Publication Date Title
US10902077B2 (en) Search result aggregation method and apparatus based on artificial intelligence and search engine
US8788434B2 (en) Search with joint image-audio queries
WO2019169872A1 (en) Method and device for searching for content resource, and server
CN108846126A (en) Generation, question and answer mode polymerization, device and the equipment of related question polymerization model
US9454600B2 (en) Refining image relevance models
US20110191336A1 (en) Contextual image search
US10762150B2 (en) Searching method and searching apparatus based on neural network and search engine
CN108520046B (en) Method and device for searching chat records
TW201220099A (en) Multi-modal approach to search query input
CN108959586A (en) Text vocabulary is identified in response to visual query
US20150127491A1 (en) Determining search relevance from user feedback
US20180018348A1 (en) Method And Apparatus For Searching Information
CN106446235B (en) Video searching method and device
US20180211287A1 (en) Digital content generation based on user feedback
CN112395420A (en) Video content retrieval method and device, computer equipment and storage medium
CN108491543A (en) Image search method, image storage method and image indexing system
CN110990597A (en) Cross-modal data retrieval system based on text semantic mapping and retrieval method thereof
Truong et al. Video search based on semantic extraction and locally regional object proposal
CN113934869A (en) Database construction method, multimedia file retrieval method and device
CN110209692A (en) Providing method, data processing method, device and the equipment of data label
CN111881900A (en) Corpus generation, translation model training and translation method, apparatus, device and medium
CN114564628A (en) Efficient knowledge base deep retrieval method based on enterprise training
CN112925939A (en) Picture searching method, description information generating method, device and storage medium
Aletras et al. Computing similarity between cultural heritage items using multimodal features
CN111241313A (en) Retrieval method and device supporting image input

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination