CN117688159A - Information searching method, device, computer equipment and storage medium - Google Patents

Information searching method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117688159A
CN117688159A CN202311829001.2A CN202311829001A CN117688159A CN 117688159 A CN117688159 A CN 117688159A CN 202311829001 A CN202311829001 A CN 202311829001A CN 117688159 A CN117688159 A CN 117688159A
Authority
CN
China
Prior art keywords
content
picture
target
scene
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311829001.2A
Other languages
Chinese (zh)
Inventor
李飞
黄爽
龙明康
潘青华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202311829001.2A priority Critical patent/CN117688159A/en
Publication of CN117688159A publication Critical patent/CN117688159A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information searching method, an information searching device, computer equipment and a storage medium. First, the content of the problem is acquired. The problem content corresponds to a target picture identifier in the knowledge base, and the target picture identifier corresponds to a target shooting picture. Next, answer content is determined based on the target shot picture and the target scene data corresponding to the target picture identification in the scene library. And finally, outputting answer content corresponding to the question content. The multi-mode search is realized by fusing different types of target shooting pictures and target scene data, and answers corresponding to the problem contents are generated in a multi-mode, so that the precision and the accuracy of information search are improved. Further, based on the target shooting picture and the target scene data, various answer contents can be returned, not only text, but also pictures, navigation maps and the like, so that richer and more visual answer contents are provided for the user, and the user experience is improved.

Description

Information searching method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to an information searching method, an information searching device, a computer device, and a storage medium.
Background
In the information age, search engines have become a major approach for people to obtain the desired information. By entering enough keywords, users can conveniently retrieve the information they want. Wearable devices in the related art, such as smart glasses products, are mainly limited to providing text search functions, and the search mode thereof needs to be improved.
Disclosure of Invention
The embodiments of the present specification aim to solve at least one of the technical problems in the related art to some extent. For this reason, the present embodiments provide an information search method, apparatus, computer device, and storage medium.
The embodiment of the specification provides an information searching method, which comprises the following steps:
acquiring problem content; the problem content is provided with a target picture identifier corresponding to the problem content in a knowledge base, and the knowledge base is stored with a corresponding relation between the picture identifier and content representation data; the target picture identifier corresponds to a target shot picture;
outputting answer content corresponding to the question content; the answer content is determined based on target scene data corresponding to the target shot picture and the target picture identification in a scene library, and the corresponding relation between the picture identification and the scene data is stored in the scene library.
In one embodiment, the target picture identification is determined by any one of the following means:
if the problem content comprises voice data, acquiring first representation data corresponding to the problem content, and searching in the knowledge base based on the first representation data to obtain the target picture identification corresponding to the problem content;
if the problem content comprises picture data, determining content data corresponding to the picture data, and searching in the knowledge base based on second representation data corresponding to the content data to obtain the target picture identification corresponding to the problem content.
In one embodiment, the knowledge base is constructed by:
acquiring a shooting picture shot by the wearable equipment;
performing character recognition based on the shot picture to obtain character content corresponding to the shot picture; or extracting the content based on the shot picture to obtain the key content corresponding to the shot picture;
embedding the text content and the key content to obtain corresponding content representation data;
and constructing the knowledge base based on the corresponding relation between the content representation data and the picture identification of the shot picture.
In one embodiment, the scene library is constructed by:
acquiring a shot picture shot by a wearable device and scene data acquired by the wearable device when the shot picture is shot;
and constructing the scene library based on the corresponding relation between the scene data and the picture identification of the shot picture.
In one embodiment, the wearable device performs the photographing action in any of the following cases:
detecting a preset shooting voice signal;
detecting that the stay time of the wearable device in the current scene reaches a preset shooting time;
and detecting a preset gesture.
In one embodiment, the answer content is obtained by the following method, including:
if the target scene data indicate that the problem content belongs to a learning scene, acquiring the target shot picture corresponding to the target picture identifier from a picture library, and predicting the target shot picture through a pre-configured large model to obtain the answer content; the picture library stores the corresponding relation between the picture identification and the shot picture;
and if the target scene data indicate that the problem content belongs to a living scene, acquiring the target shot picture corresponding to the target picture identifier from a picture library, and predicting the target shot picture and the target scene data through a pre-configured large model to obtain the answer content.
In one embodiment, the method further comprises:
and if the target picture identification does not exist in the knowledge base, calling a preset search tool to perform supplementary search on the question content to obtain answer content corresponding to the question content.
In one embodiment, the method further comprises:
recommending a digital twin scene matched with the portrait data of the wearing object; wherein the digital twin scene is determined in the scene library based on at least one of a current time, portrait data of the wearing object, a geographic location, and user status data; the wearing object is an object wearing a wearable device, and the geographic position is obtained by positioning the wearable device; the user state data is perceived through the wearable device positioning.
In one embodiment, outputting the answer content includes:
outputting the answer content in a voice playing mode; and/or
Displaying the answer content in a display area of the wearable device; wherein the wearable device comprises AR glasses; and/or
Displaying a digital twin scene; the digital twin scene comprises a target object corresponding to the answer content, and the display mode of the target object is different from that of other objects in the digital twin scene.
The embodiment of the present specification provides an information search apparatus, including:
the problem content acquisition module is used for acquiring problem content; the problem content is provided with a target picture identifier corresponding to the problem content in a knowledge base, and the knowledge base is stored with a corresponding relation between the picture identifier and content representation data; the target picture identifier corresponds to a target shot picture;
the answer content output module is used for outputting answer content corresponding to the question content; the answer content is determined based on target scene data corresponding to the target shot picture and the target picture identification in a scene library, and the corresponding relation between the picture identification and the scene data is stored in the scene library.
The present description embodiment provides a computer device, a memory storing a computer program, and a processor implementing the steps of the method according to any of the embodiments above when the computer program is executed by the processor.
The present description provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method according to any of the above embodiments.
The present description provides a computer program product comprising instructions which, when executed by a processor of a computer device, enable the computer device to perform the steps of the method of any one of the embodiments described above.
In the above-described embodiments, the content of the problem is first acquired. The problem content corresponds to a target picture identifier in the knowledge base, and the target picture identifier corresponds to a target shooting picture. Next, answer content is determined based on the target shot picture and the target scene data corresponding to the target picture identification in the scene library. And finally, outputting answer content corresponding to the question content. The multi-mode search is realized by fusing different types of target shooting pictures and target scene data, and answers corresponding to the problem contents are generated in a multi-mode, so that the precision and the accuracy of information search are improved. Further, based on the target shooting picture and the target scene data, various answer contents can be returned, not only text, but also pictures, navigation maps and the like, so that richer and more visual answer contents are provided for the user, and the user experience is improved.
Drawings
Fig. 1a is a schematic diagram of a learning application scenario provided in an embodiment of the present disclosure;
fig. 1b is a schematic diagram of an example of a life scenario provided in an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of an information searching method according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of determining a target picture identifier according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of determining a target picture identifier according to an embodiment of the present disclosure;
FIG. 5a is a schematic flow chart of knowledge base construction according to the embodiment of the present disclosure;
FIG. 5b is a schematic diagram of a building knowledge base provided in an embodiment of the present disclosure;
FIG. 6a is a schematic flow chart of a scene library construction provided in an embodiment of the present disclosure;
FIG. 6b is a schematic diagram of a build scene library provided by an embodiment of the present disclosure;
FIG. 7 is a diagram showing answer content provided in the embodiment of the present disclosure;
FIG. 8 is a diagram of answer content provided in an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of an intelligent recommendation mode provided in an embodiment of the present disclosure;
fig. 10 is a schematic diagram of an information search apparatus provided in an embodiment of the present disclosure;
fig. 11 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
In learning scenarios, problems are often faced with applying new knowledge in a book to subsequent scenarios. This situation may cause some trouble and invariance to people. For example, when a person reads an excellent article and wants to find a chapter therein next time, the person needs to turn over the book again, which is very tedious and time-consuming. Also, in living scenes, people often feel trouble by forgetting the place where the article is placed. Therefore, in order to solve the above-described problems, there is a need to propose an information search method.
In the related art, a wearable device such as a smart glasses product supports artificial intelligence capabilities such as voice interaction and voice recognition to a certain extent, which enables the wearable device to realize some basic information searching functions. However, wearable devices have some limitations in terms of large model or multi-modal related interaction capabilities.
In the related art, the wearable device is mainly limited to providing a text search function, and cannot meet the requirement of a user on multi-mode search. In addition, the wearable device also lacks personalized intelligent recommendation functions, and cannot provide customized recommendation services according to learning and living demands of users.
Based on the above analysis, the present embodiments provide an information searching method, which aims at wearable technology and products using embedded artificial intelligence, such as smart glasses as an example. The method constructs a knowledge base and a scene base through photographing and OCR (Optical Character Recognition ) technology, and combines a large model, the knowledge base, the scene base and external searching to realize multi-mode interactive searching, so that a user achieves the effect of' forgetting. In addition, the search results are presented to the user in AR (Augmented Reality ) form using augmented reality techniques to enhance the user experience. Specifically, intelligent glasses communication connection has the high in the clouds server, and the high in the clouds server is deployed and is had model, knowledge base, picture storehouse, and scene storehouse. Firstly, acquiring problem contents through intelligent glasses. The intelligent glasses send the problem content to a cloud server, the cloud server receives the problem content and searches in a knowledge base based on the problem content to obtain a target picture identifier corresponding to the problem content. And searching in the picture library according to the target picture identification to obtain a target shooting picture corresponding to the target picture identification. Searching in a scene library according to the target picture identification to obtain corresponding target scene data, and determining answer content based on the target shot picture and the corresponding target scene data in the scene library. And finally, the cloud server sends answer content to the intelligent glasses, and provides answer content corresponding to the question content for the user through the intelligent glasses. The multi-modal search is realized by fusing different types of target shooting pictures and target scene data, and answers corresponding to multi-modal search question content are realized, so that the precision and accuracy of information search are improved. Further, based on the target shooting picture and the target scene data, various answer contents can be returned, not only text, but also pictures, navigation maps and the like, so that richer and more visual answer contents are provided for the user, and the user experience is improved.
The information search method provided by the embodiment of the present specification can be applied to a scene example of learning. User a wears AR glasses. The AR glasses are in communication connection with a cloud server, and the cloud server is deployed with a large model, a knowledge base, a picture base and a scene base. Referring to fig. 1a, when a user a learns knowledge in a book B for the first time, through a preset gesture, AR glasses perform a shooting action on a content C in the current book B to obtain a shot picture D. Subsequently, the AR glasses upload the shot picture D to the cloud server, and the cloud server associates the shot picture D with a unique identifier and determines it as a picture identifier s1 of the shot picture D. And the cloud server recognizes and extracts the characters in the shot picture D by utilizing an OCR recognition technology, so as to obtain the character content corresponding to the shot picture D. The text content corresponding to the photographed picture D is subjected to embedding processing, and content representation data D1 is generated. The correspondence between the content representation data D1 and the picture identification s1 of the taken picture D is stored into a knowledge base. And storing the corresponding relation between the shot picture D and the picture identifier s1 of the shot picture D into a picture library. And identifying the shot picture D through a large model deployed on the cloud server, obtaining scene data corresponding to the shot picture D, wherein the scene type used for indicating that the shot picture D belongs to is a learning scene, and storing the corresponding relation between the scene data and a picture identifier s1 of the shot picture D into a scene library.
In the review phase of user a, if knowledge of content C is involved and not impressive, the question-answer mode may be turned on. User a may present a question E related to content C in speech form. The AR glasses convert the voice data of the problem E into text data through a voice recognition technology, and upload the text data to the cloud server. And the cloud server performs Embedding processing on the text data corresponding to the problem E to generate an Embedding vector corresponding to the text content. Then, matching is carried out in a knowledge base based on the Embedding vector corresponding to the text content, and content representation data d1 similar to the Embedding vector corresponding to the text content is found. And then taking the picture identifier s1 corresponding to the content representation data d1 in the knowledge base as a target picture identifier. And searching in the field Jing Ku according to the target picture identifier s1 to obtain target scene data corresponding to the target picture identifier s 1. The target scene data indicates that the problem E belongs to a learning scene, so that the shot picture D corresponding to the target picture identifier s1 can be obtained from the picture library as a target shot picture by using the target picture identifier s 1. And predicting the shot picture of the target, namely the shot picture D, through a large model deployed on the cloud server to obtain answer content of the question E. The cloud server sends the answer content of the question E to the AR glasses, and the AR glasses can output the answer content of the question E in a voice playing mode.
The information search method provided by the embodiment of the present specification can be applied to a scene example of life. User F wears AR glasses. The AR glasses are in communication connection with a cloud server, and the cloud server is deployed with a large model, a knowledge base, a picture base and a scene base. Referring to fig. 1b, when the user F is at the location of the coffee shop G, the AR glasses perform the shooting action on the coffee shop G through the preset gesture action, so as to obtain a shot picture H, and scene data when the shot picture H is acquired through the AR glasses. Subsequently, the AR glasses upload the photographed picture H to the cloud server, and the cloud server associates the photographed picture H with a unique identifier and determines it as a picture identifier s2 of the photographed picture H. And identifying the shot picture H through a large model deployed on the cloud server, and obtaining scene data corresponding to the shot picture H, wherein the scene data is used for indicating that the shot picture H belongs to a scene category which is a living scene. And storing the obtained corresponding relation between the scene data and the picture mark s2 of the shot picture H into a scene library. And extracting the content of the shot picture H through a large model deployed on the cloud server, and obtaining key content corresponding to the shot picture H. And embedding the key content corresponding to the shot picture H to generate content representation data d2. The correspondence between the content representation data d2 and the picture identification s2 of the taken picture H is stored in a knowledge base. And storing the corresponding relation between the shot picture H and the picture mark s2 of the shot picture H into a picture library.
When the user F is near the coffee shop G, if the user F has an idea of drinking coffee, the user F can make the AR glasses perform a photographing operation on the current location of the user F through a preset gesture operation to obtain picture data I, and then the user F proposes whether the problem K of the coffee shop exists near the user F through a voice form. The AR glasses upload the picture data I to the cloud server, and content extraction is carried out on the picture data I through a large model deployed on the cloud server to obtain content data corresponding to the picture data I. And Embedding the content data corresponding to the picture data I to generate an Embedding vector corresponding to the content data. Then, matching is performed in a knowledge base based on the Embedding vector corresponding to the content data, and content representation data d2 similar to the Embedding vector corresponding to the content data is found. And then taking the picture identifier s2 corresponding to the content representation data d2 in the knowledge base as a target picture identifier. And searching in the field Jing Ku according to the target picture identifier s2, and obtaining target scene data corresponding to the target picture identifier s2 to indicate that the problem K belongs to a living scene. And acquiring a shooting picture H corresponding to the target picture identifier s2 from a picture library by using the target picture identifier s2 as a target shooting picture. And acquiring scene data corresponding to the target picture identifier s2 from a scene library by using the target picture identifier s2 as target scene data. And predicting the target shot picture, namely shot picture H, the Embedding vector corresponding to the question K and the target scene data through a large model deployed on the cloud server, so as to obtain answer content of the question K. The cloud server sends answer content of the question K to the AR glasses, and the AR glasses can display the answer content of the question K in the display area.
The embodiment of the present disclosure provides an information searching method, referring to fig. 2, the information searching method may include the following steps:
s210, acquiring the problem content.
The problem content is provided with a target picture identification corresponding to the knowledge base, the knowledge base is stored with a corresponding relation between the picture identification and content representation data, and the target picture identification corresponds to a target shooting picture. The problem content may be a query or search request by a user. The picture identification may be an identifier or marking for uniquely identifying or representing a picture. The picture identification may be a number, a string, a hash value, or other form of identifier. The content representation data may be embedded in any of a vector, text feature. The knowledge base may be a database or knowledge storage system that stores information about the content of the problem. A series of picture identifications may be stored in the knowledge base, each picture identification being associated with corresponding content representation data. The target picture identification may be an identifier or tag for uniquely identifying or representing the picture to which the problem content corresponds. The target shot picture may be a specific picture related to the content of the problem.
Specifically, in the question-answering mode, the user can ask questions in various ways, including inputting text, inputting voice, taking pictures, etc., to obtain the content of the questions. Searching and matching are carried out in a knowledge base based on the problem content, so that the picture identification corresponding to the problem content is determined to be the target picture identification. For example, after the problem content is obtained, the corresponding relation between the picture identifier and the content representation data is stored in the knowledge base, so that the problem content is embedded to obtain the problem representation data. And carrying out similarity calculation on the basis of the problem representation data and the content representation data to obtain content representation data matched with the problem content, wherein the picture identification corresponding to the content representation data is the target picture identification. And after the target picture identification is determined, searching the shooting picture corresponding to the target picture identification in a picture library by utilizing the target picture identification, and taking the shooting picture as a target shooting picture.
It should be noted that, in the embodiment of the present disclosure, the information searching method may be applied to a wearable device, and illustrated by using smart glasses. A voice wake system may be integrated in the smart glasses. The voice wake-up system can enable the intelligent glasses to automatically monitor the voice input of the user in the standby mode, so that the user experience is improved. When the user speaks a specific wake-up word, the voice wake-up system activates the smart glasses to enable the smart glasses to enter a working mode (such as a question-answering mode), so that the user can interact with the smart glasses. High precision sensors and motion recognition techniques may also be integrated into the smart glasses to recognize user-specific gestures, such as tap, pinch, etc. When a user performs a specific gesture operation, the sensor immediately captures the action and interprets the action as an activation instruction, so that the intelligent glasses respond and start the working mode. The intelligent glasses can be provided with a special physical key or touch area, and the user only needs to press the key or touch the touch area, so that the intelligent glasses can be activated to enter the working state. Under the question-answering mode of the intelligent glasses, voice questions are acquired in a voice acquisition mode, the voice questions can be directly sent to the cloud server as question contents, and text data corresponding to the voice questions can be sent to the cloud server as question contents. Or, acquiring a photo through a photographing function, collecting a voice problem, and sending the photo and the voice problem to a cloud server as problem contents.
S220, outputting answer content corresponding to the question content.
The answer content is determined based on target scene data corresponding to the target shot picture and the target picture identification in a scene library, and the corresponding relation between the picture identification and the scene data is stored in the scene library. The scene data may be information associated with the taken picture corresponding to the picture identification. For example, the scene data may include any one of photographing time, information of photographing place (such as city, street name, latitude and longitude), photographing pose. The target scene data may be information associated with a target captured picture.
Specifically, after the target picture identification is determined, the corresponding scene data is searched in the scene library by using the target picture identification, and the scene data is used as target scene data. By analyzing the target shooting picture and the target scene data, comprehensively considering information of shooting places, shooting time, shooting postures and the like in the scene data, and simultaneously combining contents such as objects, events and the like in the target shooting picture, data related to the problem contents can be more comprehensively analyzed, and answer contents related to the problem contents can be generated.
It should be noted that if a question requires multiple rounds of interaction to get an answer, multiple rounds of questions and answers may be extended based on the pre-set question and answer content. This means that previous context information needs to be referenced and utilized in subsequent interactions to better understand the question and provide consistent answers.
In the above embodiment, the content of the problem is first acquired. The problem content corresponds to a target picture identifier in the knowledge base, and the target picture identifier corresponds to a target shooting picture. Next, answer content is determined based on the target shot picture and the target scene data corresponding to the target picture identification in the scene library. And finally, outputting answer content corresponding to the question content. The multi-mode search is realized by fusing different types of target shooting pictures and target scene data, and answers corresponding to the problem contents are generated in a multi-mode, so that the precision and the accuracy of information search are improved. Further, based on the target shooting picture and the target scene data, various answer contents can be returned, not only text, but also pictures, navigation maps and the like, so that richer and more visual answer contents are provided for the user, and the user experience is improved.
In some implementations, the target picture identification is determined by: if the problem content comprises voice data, acquiring first representation data corresponding to the problem content, and searching in a knowledge base based on the first representation data to obtain a target picture identifier corresponding to the problem content.
Wherein the first representation data may be obtained by converting speech data into a machine processable and analyzable form, e.g. the first representation data may be an embedded vector.
In particular, when the problem content includes voice data, in order to use the problem content for searching and matching of the knowledge base, the voice data needs to be converted into a form that can be understood and processed by a machine, so as to obtain first representation data corresponding to the problem content. Then, the similarity between the first representation data and the content representation data in the knowledge base is calculated so as to find the content representation data matched with the problem content in the knowledge base. And finally, taking the picture identifier corresponding to the content representation data in the knowledge base as a target picture identifier corresponding to the problem content.
In some implementations, voice data may be converted to text data using voice recognition techniques. And then, embedding text data corresponding to the problem content to obtain first representation data corresponding to the problem content. And matching the first representation data with the content representation data in the knowledge base by using a similarity measurement method (such as cosine similarity, euclidean distance and the like), and taking the maximum value in the results as a matching result corresponding to the problem content. If the matching result is not less than the preset similarity threshold, it can be considered that content representation data similar to the first representation data exists in the knowledge base. Therefore, the picture identifier corresponding to the content representation data in the knowledge base can be used as the target picture identifier corresponding to the problem content. The preset similarity threshold may be a critical value set during the matching process, and is used for judging whether the similarity between the two representation data meets a certain requirement. When the similarity table of the matching result is not smaller than the preset similarity threshold, the two representation data are considered to have enough similarity, and the matching can be successfully performed.
For example, referring to fig. 3, when the question-answering mode is turned on, the wearable device supports voice questions, and the user can ask questions to the wearable device in voice form. The wearable device converts voice data into text data through a voice recognition technology, and uploads the text data to the cloud service. The cloud server performs Embedding processing on the text data to generate an Embedding vector (Embedding vector) corresponding to the text content, namely first representation data. And then, matching in a knowledge base based on the Embedding vector corresponding to the text content, and searching content representation data similar to the Embedding vector corresponding to the text content. And then taking the picture identification corresponding to the content representation data in the knowledge base as a target picture identification.
In the above embodiment, if the problem content includes voice data, first representation data corresponding to the problem content is obtained, and a target picture identifier corresponding to the problem content is found in the knowledge base based on the first representation data, so as to provide a data base for matching the target scene data and the target shot picture.
In some implementations, the target picture identification is determined by: if the problem content comprises the picture data, determining content data corresponding to the picture data, and searching in a knowledge base based on second representation data corresponding to the content data to obtain a target picture identifier corresponding to the problem content.
Wherein the content data may be information or features extracted from the picture data about the picture content. The second representation data may be in a form that converts the picture data into a processable and analyzable form, e.g. the second representation data may be an embedded vector.
Specifically, when the problem content includes the picture data, the content in the picture data can be identified through an image feature extraction algorithm, and the content data corresponding to the picture data is obtained. Text information can be extracted from the picture data through OCR technology, so that content data corresponding to the picture data can be obtained. In order to use the content data for searching and matching of the knowledge base, the content data needs to be converted into a form that can be understood and processed by the machine, so as to obtain second representation data corresponding to the content data. The second representation data is then analyzed and processed with the content representation data in the knowledge base using natural language processing techniques to find content representation data in the knowledge base that matches the problem content. And finally, taking the picture identifier corresponding to the content representation data in the knowledge base as a target picture identifier corresponding to the problem content.
In some implementations, OCR recognition technology can be deployed on a wearable device. In other embodiments, OCR recognition technology may be deployed at a cloud server.
For example, referring to fig. 4, when the question-answering mode is turned on, the wearable device supports taking a question, and the user can present a question to the wearable device in the form of taking a photograph. The wearable device uploads the picture data to the cloud service. And understanding the content in the picture data through the large model deployed on the cloud server, and obtaining content data corresponding to the picture data. Then, embedding processing is performed on the content data corresponding to the picture data, and an Embedding vector (Embedding vector) corresponding to the content data, namely second representation data, is generated. And then, matching in a knowledge base based on the Embedding vector corresponding to the content data, and searching content representation data similar to the Embedding vector corresponding to the content data. And then taking the picture identification corresponding to the content representation data in the knowledge base as a target picture identification.
In the above embodiment, if the problem content includes the picture data, the content data corresponding to the picture data is determined, and the target picture identifier corresponding to the problem content is found in the knowledge base based on the second representation data corresponding to the content data, so as to provide a data basis for matching the target scene data and the target shot picture subsequently.
In some embodiments, referring to fig. 5a, a knowledge base is constructed by:
s510, acquiring a shooting picture shot through the wearable equipment.
S520, performing character recognition based on the shot picture to obtain character content corresponding to the shot picture. Or extracting the content based on the shot picture to obtain the key content corresponding to the shot picture.
S530, embedding the text content and the key content to obtain corresponding content representation data.
S540, constructing a knowledge base based on the corresponding relation between the content representation data and the picture identification of the shot picture.
The wearable device may be an electronic device worn on the body, such as a smart watch, smart glasses, smart bracelets, etc. The key content may be information or features extracted from the picture data about the picture content. The embedding process may be converting the text content or key content into a vector representation in continuous space for further processing and analysis by a computer.
Specifically, in the memory mode, shooting is performed through the wearable device, and a shooting picture is obtained. To better manage and index the pictures, each shot picture is associated with a unique identifier and is determined as a picture identification. And extracting and recognizing the characters in the shot picture by using a character recognition technology, such as an OCR (optical character recognition) technology, so as to obtain the corresponding character content of the shot picture. Or analyzing and processing the shot picture by using image processing and computer vision technologies, such as a deep learning model (such as a convolutional neural network) and an image feature extraction algorithm, extracting content in the shot picture, such as elements of an object, a scene and the like, and obtaining key content corresponding to the shot picture. Then, the text content and the key content are embedded and converted into vector representation, and corresponding content representation data is obtained. And finally, constructing a corresponding relation based on the content representation data and the picture identification of the shot picture to form a knowledge base, and storing the corresponding relation between the content representation data and the picture identification of the shot picture in the knowledge base.
For example, referring to fig. 5b, when the memory mode is turned on, the wearable device may take a photograph, record what the user sees, and take a picture. The wearable device uploads the shot picture to the cloud service. The cloud server associates each shot picture with a unique identifier and determines the unique identifier as a picture identification. And performing character recognition on the shot picture through an OCR recognition technology. If the text information can be extracted from the shot picture, the text content corresponding to the shot picture is obtained. If the text information can not be extracted from the shot picture, the content in the picture data can be understood through a large model deployed on the cloud server, and the key content corresponding to the shot picture can be obtained. Then, the text content and/or the key content are embedded, and corresponding Embedding vectors (embedded vectors) are generated, namely content representation data. And finally, constructing a corresponding relation based on the content representation data and the picture identification of the shot picture to form a knowledge base, and storing the corresponding relation between the content representation data and the picture identification of the shot picture in the knowledge base.
In some implementations, OCR recognition technology can be deployed on a wearable device. In other embodiments, OCR recognition technology may be deployed at a cloud server.
It should be noted that, by collecting and analyzing feedback and behavior data of the user, the knowledge base can be continuously improved and optimized, so as to improve the intelligence and practicality of the knowledge base.
In the above embodiment, a photographed picture photographed by the wearable device is obtained, and text recognition is performed based on the photographed picture, so as to obtain text content corresponding to the photographed picture. Or extracting the content based on the shot picture to obtain the key content corresponding to the shot picture, embedding the text content and the key content to obtain the corresponding content representation data, and constructing a knowledge base based on the corresponding relation between the content representation data and the picture identification of the shot picture, so that more accurate searching can be provided, and better experience and service can be provided for the user.
In some embodiments, referring to fig. 6a, a scene library is constructed by:
s610, acquiring a shot picture shot by the wearable device and scene data acquired by the wearable device when the shot picture is shot.
S620, constructing a scene library based on the corresponding relation between the scene data and the picture identification of the shot picture.
Specifically, in the memory mode, shooting is performed through the wearable device, and a shooting picture is obtained. To better manage and index the pictures, each shot picture is associated with a unique identifier and is determined as a picture identification. When a user shoots through the wearable device, the device can record shooting pictures and related scene data at the same time. The scene data may include any one of photographing time, photographing place information (such as city, street name, latitude and longitude), photographing pose, and the like. And carrying out identification processing on the shot picture by using an image identification algorithm to obtain a scene type corresponding to the shot picture. The scene type is also part of the scene data. According to the corresponding relation between the scene data and the picture identification of the shot picture, a scene library can be constructed, and the corresponding relation between the scene data and the picture identification of the shot picture is stored in the scene library.
For example, referring to fig. 6b, when the memory mode is turned on, the wearable device may collect scene data (such as gesture information, geographical position information) of the current shooting position through a built-in sensor (such as gyroscope, GPS). And carrying out recognition processing on the shot picture by using an image recognition algorithm to obtain a scene type corresponding to the shot picture, wherein the scene type is also a part of scene data. According to the corresponding relation between the scene data and the picture identification of the shot picture, a scene library can be constructed, and the corresponding relation between the scene data and the picture identification of the shot picture is stored in the scene library.
In the above embodiment, the shot picture shot by the wearable device is acquired, and the scene data acquired by the wearable device when the shot picture is shot is acquired, and the scene library is constructed based on the corresponding relation between the scene data and the picture identification of the shot picture, so that accurate scene information is provided for the follow-up, user experience is improved, personalized recommendation service is realized, and the data foundation is supported for intelligent application development.
In some implementations, the wearable device performs the shooting action in the following scenario: a preset photographed voice signal is detected.
The shooting voice signal may be a voice signal that the user instructs the wearable device to perform a shooting operation through a voice command or a specific voice control word.
Specifically, the wearable device can monitor sounds in the environment in real time and detect voice signals. When the wearable device detects a preset shooting voice signal, the wearable device needs to execute shooting functions according to specific hardware and operating system characteristics.
In the above embodiment, the preset shooting voice signal is detected, so that the wearable device executes the shooting action, and data is provided for constructing the knowledge base and the scene graph, and data support is also provided for information search.
In some implementations, the wearable device performs the shooting action in the following scenario: and detecting that the stay time of the wearable device in the current scene reaches the preset shooting time.
The preset shooting time period can be a shooting time threshold preset in the wearable device, and when a user stays in a specific scene to reach the preset shooting time period, the wearable device can automatically perform shooting operation. The preset shooting time length can be set by a user according to actual demands, for example, when a tourist is photographed, different preset shooting time lengths can be set according to the stay time lengths of different scenic spots.
In particular, the wearable device utilizes built-in sensors (such as GPS, gyroscopes) to identify and locate the current scene. Meanwhile, a clock or timer on the wearable device realizes a timing function. When the current scene in which the user is located is determined, the wearable device starts to monitor the stay time of the user in the current scene. The device compares the stay time with a preset shooting time. When the stay time of the wearable device in the current scene reaches the preset shooting time, the wearable device executes shooting actions.
In the above embodiment, it is detected that the stay time of the wearable device in the current scene reaches the preset shooting time, so that the wearable device executes the shooting action, and data is provided for constructing a knowledge base and a scene graph and data support is provided for information searching.
In some implementations, the wearable device performs the shooting action in the following scenario: and detecting a preset gesture.
Specifically, the wearable device is internally provided with a corresponding gesture sensor, so that the hand actions of a user can be monitored in real time. The wearable device can transmit gesture data acquired by the sensor to an internal gesture recognition algorithm for processing. The algorithm analyzes and analyzes the sensor data according to a preset gesture definition to determine whether a user performs a preset gesture. When the gesture recognition algorithm confirms that the gesture action of the user is matched with the preset gesture, the wearable device can execute shooting action.
In the above embodiment, the preset gesture motion is detected, so that the wearable device executes the shooting motion, and data is provided for constructing the knowledge base and the scene graph, and data support is also provided for information search.
It should be noted that, when the memory mode is turned on, the wearable device may take a photograph to record what the user sees, and obtain a photographed picture. The wearable device uploads the shot picture to the cloud service. The cloud server associates each shot picture with a unique identifier and determines the unique identifier as a picture identification. And constructing a corresponding relation based on the picture identification of the shot picture and the shot picture, forming a picture library, and storing the corresponding relation between the shot picture and the picture identification of the shot picture in a knowledge base.
In some embodiments, obtaining answer content by: and if the target scene data indicate that the problem content belongs to the learning scene, acquiring a target shot picture corresponding to the target picture identification from a picture library, and predicting the target shot picture through a pre-configured large model to obtain answer content.
Wherein a learning scenario may be a scenario where a problem relates to knowledge, academic, education, etc. For example, questions related to school courses, scientific theory, historical events, and the like may be categorized as learning scenarios. The preconfigured large model is a model which is trained and adjusted in advance and has higher accuracy and generalization capability in the field of machine learning or artificial intelligence. Preconfigured large models are typically trained using large amounts of data and are optimized and parameter tuned to perform well on a particular task. The pre-configured large model can be used in various applications such as natural language processing, image recognition, voice recognition and the like, and corresponding output results are obtained by inputting specific data and running a pre-trained model. The corresponding relation between the picture identification and the shot picture is stored in the picture library.
Specifically, referring to fig. 7, after determining that there is a target picture identifier corresponding to the problem content in the knowledge base, scene data corresponding to the target picture identifier is queried in the field Jing Ku, and is used as target scene data. If the target scene data indicates that the problem content belongs to a learning scene, a target shot picture corresponding to the target picture identification can be queried from a picture library, and then the target shot picture is extracted from the picture library. And inputting the target shooting picture into a pre-configuration large model, analyzing and predicting the target shooting picture by the pre-configuration large model, extracting information from the target shooting picture and converting the information into answer content.
In the above embodiment, if the target scene data indicates that the problem content belongs to the learning scene, the target shot picture corresponding to the target picture identifier is obtained from the picture library, and the target shot picture is predicted by the pre-configured large model, so as to obtain the answer content. The pre-configured large model is trained and adjusted, so that the method has high accuracy and generalization capability. By predicting the target shot picture, relatively accurate answer content can be obtained. The automatic prediction is performed by using the preconfigured large model, so that the manual processing time and the cost can be greatly saved.
In some embodiments, obtaining answer content by: and if the target scene data indicate that the problem content belongs to a living scene, acquiring a target shot picture corresponding to the target picture identifier from a picture library, and predicting the target shot picture and the target scene data through a pre-configured large model to obtain answer content.
Wherein, the living scene can be a context or theme related to daily life, such as the living scene can be at least one of shopping scene, medical scene, travel scene.
Specifically, referring to fig. 8, after determining that there is a target picture identifier corresponding to the problem content in the knowledge base, scene data corresponding to the target picture identifier is queried in the field Jing Ku, and is used as target scene data. If the target scene data indicates that the problem content belongs to a living scene, a target shot picture corresponding to the target picture identification can be queried from a picture library, and then the target shot picture is extracted from the picture library. The target scene data may assist in the prediction of answer content from the target captured picture. Therefore, the target shooting picture and the target scene data are input into a pre-configured large model, the pre-configured large model analyzes and predicts the target shooting picture and the target scene data, more information is extracted from the target shooting picture and the target scene data, and the information is converted into answer content.
For example, when the user has a similar intention of "i want to eat barbecue", the "nearby barbecue store" may be regarded as the content of the problem. Identifying the "nearby barbecue store" of the problem content can obtain content data corresponding to the problem content. And embedding the content data corresponding to the problem content to obtain the representation data corresponding to the problem content. If content representation data similar to the representation data corresponding to "nearby barbecue store" of the problem content is matched in the knowledge base, it may be determined that the problem content is the corresponding target picture identification of "nearby barbecue store". And then inquiring target scene data corresponding to the target picture identification in the scene library. If the target scene data corresponding to the target picture identification indicates that the problem content of the nearby barbecue store belongs to the living scene, the target shooting picture corresponding to the target picture identification can be queried from the picture library, and then the target shooting picture can be extracted from the picture library. And inputting the target shot picture and the target scene data into a pre-configured large model, and analyzing and predicting the target shot picture and the target scene data by the pre-configured large model, wherein a navigation route to a barbecue store can be used as answer content.
In the above embodiment, in a living scene, the picture is generally more capable of directly transmitting information than text, if the target scene data indicates that the problem content belongs to the living scene, the target shot picture corresponding to the target picture identifier is obtained from the picture library, and the target shot picture and the target scene data are predicted by pre-configuring the large model to obtain the answer content, so that more accurate and convenient searching service is provided for the user.
In some embodiments, the method may further comprise: and if the knowledge base does not have the target picture identification, calling a preset search tool to perform supplementary search on the question content to obtain answer content corresponding to the question content.
The search tool may be a technical tool or platform for searching and retrieving related information on the internet. The searching tool utilizes a searching algorithm and an indexing system to screen out related contents from massive network data according to the problem contents and present the related contents to a user. For example, the search tool may be any of a web page, an APP, a search engine.
Specifically, identifying or extracting the problem content may obtain content data corresponding to the problem content. And embedding the content data corresponding to the problem content to obtain the representation data corresponding to the problem content. If the content representation data similar to the representation data corresponding to the problem content is not matched in the knowledge base, it may be inferred that there is no target picture identification corresponding to the problem content in the knowledge base. Therefore, to make up for the deficiency of the knowledge base, a proper preset search tool, such as a search engine, an artificial intelligence algorithm, etc., can be selected according to the characteristics of the problem content. Inputting the question content into a preset searching tool, and executing searching operation to obtain answer content corresponding to the question content, so as to realize supplementary searching for the question content. In some embodiments, the answer content may also be supplemented by a preset search tool, such as map information, weather information, and the like.
For example, when the user has a similar intention of "i want to eat barbecue", the "nearby barbecue store" may be regarded as the content of the problem. And embedding the content data corresponding to the problem content to obtain the representation data corresponding to the problem content. If no content representation data in the knowledge base that is similar to the representation data corresponding to "nearby barbecue" of the problem content is matched, it may be inferred that there is no target picture identification corresponding to the problem content in the knowledge base, i.e., no matching barbecue store in the knowledge base. Then, the nearby barbecue stores can be queried by means of the preset search tool needle, and the queried barbecue stores are output for the user to select.
In the above embodiment, if the knowledge base does not have the target picture identifier, a preset search tool is called to perform supplementary search on the question content to obtain answer content corresponding to the question content. By calling a preset search tool, the information range required for solving the problem can be expanded, so that the defect of a knowledge base is overcome.
In some embodiments, the method may further comprise: a digital twinning scene matching the representation data of the wearing object is recommended.
Wherein the digital twin scene is determined in a scene library based on at least one of a current time, representation data of the wearing object, a geographic location, and user status data. The wearing object is an object wearing the wearable device, and the geographic position is obtained through positioning of the wearable device. The user status data is perceived through wearable device positioning. The digital twin scene may be a virtual model in the digital world that represents the physical scene or process of the real world by modeling and simulating the data of the actual scene. A digital twinned scene can be understood as a virtual copy corresponding to an actual scene for simulating and predicting the behavior and changes of the actual scene. The geographic location may be geographic location information where the user is located, and may be obtained by a location service or location information provided by the user. The user state data may be information describing the current state or behavior of the user, such as the user's emotion, health status, activity trajectory, etc.
Specifically, when the memory mode is started, scene data of the shooting position, including gesture information, geographical position information and the like, can be recorded through a sensor (such as a gyroscope and a GPS) built in the wearable device. The scene data can be combined with the shot picture content information to jointly construct a scene library. Later, it is necessary to collect image data of the wearing object including information of personal characteristics, preference preferences, behavior habits, and the like. Through deep knowledge of the interests, favorites and daily life habits of wearing objects, a more accurate basis can be provided for matching proper digital twin scenes. Meanwhile, a positioning system built in the wearable device can provide geographic position information of the wearing object. By using the geographic position information, the characteristics of the environment and the surrounding of the wearing object, such as the city, the climate condition, the surrounding facilities and the like of the wearing object, can be further known, so that more reference basis is provided for matching the digital twin scene. In addition, user status data, such as heart rate, eye movement data, sleep quality, etc., are perceived by the wearable device. And designing a matching algorithm, and matching the matching algorithm with the data of the scene library according to at least one parameter of the current moment, the portrait data, the geographic position and the user state data of the wearing object. The algorithm can find a digital twin scene suitable for wearing the object according to different weights and rules. And finally, recommending the matched digital twin scene to the wearing object according to the matching result. The recommendation may be presented through a display screen on the wearable device.
In an example, when the memory mode is turned on, the wearable device may acquire specific scene information, such as roads, malls, street edges, advertisements, etc., using the gesture unit such as a positioning unit, a gyroscope, etc. When a user takes a photo, the wearable device combines the acquired scene information with the content information of the photo and stores the scene information in a scene library. Scene data in the scene library is accumulated continuously with the passage of time, so that a rich and diversified scene library is formed.
In another example, referring to fig. 9, by recording historical behavior data of a user, such as browsing records, question distribution, etc., information about interest preferences, areas of interest, etc. of the user can be known. And constructing the user portrait according to the information such as living habit, knowledge mastering condition, character preference and the like of the user. By means of the characteristics of the user portrait information and the scene library, collaborative filtering algorithms (such as collaborative filtering algorithms based on user behaviors) are adopted to conduct intelligent recommendation. The algorithm recommends knowledge points or living contents suitable for interests and demands of users by analyzing similarity among users and combining the contents in the scene library. Therefore, according to the information of the historical behaviors, living habits, knowledge mastering conditions, character preferences and the like of the user, personalized recommendation of the user can be achieved, and therefore user experience is improved.
By way of example, by analyzing a user representation through a large model, regular behavior of the user can be identified and recommended at similar time periods. For example, a user is accustomed to drinking coffee ten times in the morning, and therefore, during a similar period of time, will pop up a coffee shop in the library of recommended scenes and located near the user. The peripheral targets can be automatically recommended in combination with the user state sensed by the sensor. Recommendations for these targets may be based on thresholds (e.g., large models, adaptations, etc.) and determined in conjunction with the user state perceived by the sensor. For example, the restaurant can be recommended according to the preference of the user, and the priority of the recommendation result can be adjusted according to the current state (such as hunger degree) of the user, so that the closer to the user, the higher the priority is, and the better the user requirement can be met.
It should be noted that, the recommendation algorithm may be continuously optimized by analyzing feedback and behavior data of the user, so as to improve the intelligence and practicality of the system.
In the embodiment, the digital twin scene matched with the portrait data of the wearing object is recommended, personalized recommended content can be provided for the wearing object, and user experience is improved.
In some embodiments, outputting answer content by the following means may include: and outputting answer contents in the form of voice playing.
Specifically, first, the text content of the answer is converted into an audio file of the answer using a speech synthesis technique. For example, a third party speech synthesis API may be selected for use, or a speech synthesis engine may be used to convert the textual content of the answer to an audio file of the answer. Then, the generated audio file is transmitted to the voice playing device by wireless transmission or the like. Finally, the voice playing device converts the audio file of the answer into sound, outputs answer content in a voice playing mode, and feeds back the answer content to the user. The voice playing device may be a wearable device with a built-in speaker, such as a smart watch, smart glasses, etc.
In the above embodiment, the answer content is output in the form of voice playing, so that the user can receive the information more conveniently. And under the condition that two hands are needed for some scenes such as driving, sports and the like, the voice playing mode can also avoid distraction of the user and ensure the safety of the user.
In some embodiments, outputting answer content by the following means may include: the answer content is presented in a display area of the wearable device.
Wherein the wearable device comprises AR glasses. The display area may be an area on the wearable device for displaying images, text, or other information. In AR glasses, the display area may be an area where virtual information is superimposed in the user's field of view by optical techniques, creating an augmented reality effect. The real world can be seen through the AR glasses or virtual information can be projected into the field of view of the user, the virtual information is superimposed with the real world seen by the user, and the user can see the real world and the superimposed virtual information simultaneously through the display area of the AR glasses.
Specifically, the answer content may be transmitted to the wearable device via a wireless connection, such as bluetooth, wi-Fi, or mobile network. The received answer content is processed on the wearable device to ensure that it fits into the display area of the device. Such as appropriate adjustments based on the resolution and screen size of the device. The processed answer content is then displayed in a display area of the wearable device. The user can view the answer content through the display area of the wearable device.
In some implementations, the wearable device may be AR glasses. Firstly, data acquisition is carried out on a real scene through a built-in camera and a sensor of the AR glasses, the real scene is transmitted into a processor to analyze and reconstruct the real scene, then, the spatial position change data of a user in a real environment is updated in real time through the built-in camera, the gyroscope and other sensors of the AR glasses, so that the relative positions of the virtual scene and the real scene are obtained, alignment of a coordinate system is realized, fusion calculation is carried out on the virtual scene and the real scene, and finally, a synthesized image of the virtual scene and the real scene is presented to the user. The user can see the fusion of the virtual scene and the real scene on the AR glasses, so that a richer and immersive augmented reality experience is obtained. For example, answer content may relate to a specific location, such as a C1 coffee shop. Real-time navigation information, including route to the C1 coffee shop, distance, and expected arrival time, etc., can be seen through the AR glasses. Meanwhile, the current real environment including streetscapes, pedestrians and other traffic conditions can be observed through the AR glasses.
In the embodiment, the answer content is displayed in the display area of the wearable device, so that a user does not need to take up a mobile phone or other devices to check the answer, and more convenient and faster experience is provided.
In some embodiments, outputting answer content by the following means may include: a digital twinning scenario is shown.
The digital twin scene comprises a target object corresponding to answer content, and the display mode of the target object is different from that of other objects in the digital twin scene.
Specifically, in the digital twin scene, a target object corresponding to answer content is identified through image identification or target detection technology. Meanwhile, the position and posture information of the target object are determined, and accurate presentation of the target object in the virtual scene is ensured. In order to highlight the importance or specificity of the target object, the display mode of the target object is customized according to the characteristics and the attributes of the target object. For example, different colors, sizes, shapes, etc. may be used. The display mode of the target object is different from the display mode of other objects in the digital twin scene, so that the target object is more obvious and highlighted in the digital twin scene. Digital twin scenes are displayed through projection, virtual reality or augmented reality technologies. For example, the target object may be highlighted in the digital twin scene while other objects in the digital twin scene are darkened.
Illustratively, the intent represented by the user is identified by speech recognition techniques, such as "I want to eat barbecue". Based on the user intention and scene information, the navigation route can be projected in real time in the digital twin scene, and the user is provided with a guide for intuitively reaching the destination. For example, when the user is identified as having the intention of 'i want to eat barbecue', a route to a certain barbecue stall can be directly projected in the digital twin scene, so that intuitive guidance is provided for the user. In addition, surrounding streets may also be highlighted in the projected navigation route in order to enhance the user's experience.
In the embodiment, the digital twin scene is displayed, so that the user can feel like being personally on the scene, the immersion and participation of the user are enhanced, and the user experience is improved.
Referring to fig. 10, an information search apparatus 1000 according to an embodiment of the present disclosure includes: a question content acquisition module 1010 and an answer content output module 1020.
A question content acquisition module 1010, configured to acquire question content; the problem content is provided with a target picture identifier corresponding to the problem content in a knowledge base, and the knowledge base is stored with a corresponding relation between the picture identifier and content representation data; the target picture identifier corresponds to a target shot picture;
An answer content output module 1020, configured to output answer content corresponding to the question content; the answer content is determined based on target scene data corresponding to the target shot picture and the target picture identification in a scene library, and the corresponding relation between the picture identification and the scene data is stored in the scene library.
For a specific description of the information searching apparatus, reference may be made to the description of the information searching method hereinabove, and the description thereof will not be repeated here.
In some embodiments, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 11. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an information search method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 11 is merely a block diagram of a portion of the structure associated with the aspects disclosed herein and is not limiting of the computer device to which the aspects disclosed herein apply, and in particular, the computer device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The present description provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the method steps of the above embodiments when executing the computer program.
The present description embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method of any of the above embodiments.
The present description provides a computer program product comprising instructions which, when executed by a processor of a computer device, enable the computer device to perform the steps of the method of any one of the embodiments described above.
It should be noted that the logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered as a ordered listing of executable instructions for implementing logical functions, and may be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

Claims (12)

1. An information search method, the method comprising:
acquiring problem content; the problem content is provided with a target picture identifier corresponding to the problem content in a knowledge base, and the knowledge base is stored with a corresponding relation between the picture identifier and content representation data; the target picture identifier corresponds to a target shot picture;
outputting answer content corresponding to the question content; the answer content is determined based on target scene data corresponding to the target shot picture and the target picture identification in a scene library, and the corresponding relation between the picture identification and the scene data is stored in the scene library.
2. The method of claim 1, wherein the target picture identification is determined by any of:
if the problem content comprises voice data, acquiring first representation data corresponding to the problem content, and searching in the knowledge base based on the first representation data to obtain the target picture identification corresponding to the problem content;
if the problem content comprises picture data, determining content data corresponding to the picture data, and searching in the knowledge base based on second representation data corresponding to the content data to obtain the target picture identification corresponding to the problem content.
3. The method of claim 1, wherein the knowledge base is constructed by:
acquiring a shooting picture shot by the wearable equipment;
performing character recognition based on the shot picture to obtain character content corresponding to the shot picture; or extracting the content based on the shot picture to obtain the key content corresponding to the shot picture;
embedding the text content and the key content to obtain corresponding content representation data;
and constructing the knowledge base based on the corresponding relation between the content representation data and the picture identification of the shot picture.
4. The method of claim 1, wherein the scene library is constructed by:
acquiring a shot picture shot by a wearable device and scene data acquired by the wearable device when the shot picture is shot;
and constructing the scene library based on the corresponding relation between the scene data and the picture identification of the shot picture.
5. The method of claim 3 or 4, wherein the wearable device performs a shooting action in any of the following situations:
Detecting a preset shooting voice signal;
detecting that the stay time of the wearable device in the current scene reaches a preset shooting time;
and detecting a preset gesture.
6. The method of claim 1, wherein the answer content is obtained by:
if the target scene data indicate that the problem content belongs to a learning scene, acquiring the target shot picture corresponding to the target picture identifier from a picture library, and predicting the target shot picture through a pre-configured large model to obtain the answer content; the picture library stores the corresponding relation between the picture identification and the shot picture;
and if the target scene data indicate that the problem content belongs to a living scene, acquiring the target shot picture corresponding to the target picture identifier from a picture library, and predicting the target shot picture and the target scene data through a pre-configured large model to obtain the answer content.
7. The method according to claim 1, wherein the method further comprises:
and if the target picture identification does not exist in the knowledge base, calling a preset search tool to perform supplementary search on the question content to obtain answer content corresponding to the question content.
8. The method according to claim 1, wherein the method further comprises:
recommending a digital twin scene matched with the portrait data of the wearing object; wherein the digital twin scene is determined in the scene library based on at least one of a current time, portrait data of the wearing object, a geographic location, and user status data; the wearing object is an object wearing a wearable device, and the geographic position is obtained by positioning the wearable device; the user state data is perceived through the wearable device positioning.
9. The method of claim 1, wherein outputting the answer content by:
outputting the answer content in a voice playing mode; and/or
Displaying the answer content in a display area of the wearable device; wherein the wearable device comprises AR glasses; and/or
Displaying a digital twin scene; the digital twin scene comprises a target object corresponding to the answer content, and the display mode of the target object is different from that of other objects in the digital twin scene.
10. An information search apparatus, the apparatus comprising:
the problem content acquisition module is used for acquiring problem content; the problem content is provided with a target picture identifier corresponding to the problem content in a knowledge base, and the knowledge base is stored with a corresponding relation between the picture identifier and content representation data; the target picture identifier corresponds to a target shot picture;
the answer content output module is used for outputting answer content corresponding to the question content; the answer content is determined based on target scene data corresponding to the target shot picture and the target picture identification in a scene library, and the corresponding relation between the picture identification and the scene data is stored in the scene library.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.
12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.
CN202311829001.2A 2023-12-26 2023-12-26 Information searching method, device, computer equipment and storage medium Pending CN117688159A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311829001.2A CN117688159A (en) 2023-12-26 2023-12-26 Information searching method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311829001.2A CN117688159A (en) 2023-12-26 2023-12-26 Information searching method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117688159A true CN117688159A (en) 2024-03-12

Family

ID=90126417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311829001.2A Pending CN117688159A (en) 2023-12-26 2023-12-26 Information searching method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117688159A (en)

Similar Documents

Publication Publication Date Title
US10839605B2 (en) Sharing links in an augmented reality environment
CN109643158B (en) Command processing using multi-modal signal analysis
CN110780707B (en) Information processing apparatus, information processing method, and computer readable medium
US9269011B1 (en) Graphical refinement for points of interest
CN110263213B (en) Video pushing method, device, computer equipment and storage medium
US20190333478A1 (en) Adaptive fiducials for image match recognition and tracking
CN112805743A (en) System and method for providing content based on knowledge graph
KR20180055708A (en) Device and method for image processing
CN107533685A (en) Personalized context suggestion engine
WO2019214453A1 (en) Content sharing system, method, labeling method, server and terminal device
US10043069B1 (en) Item recognition using context data
KR20100002756A (en) Matrix blogging system and service support method thereof
KR20210156283A (en) Prompt information processing apparatus and method
KR102628042B1 (en) Device and method for recommeding contact information
KR20190117837A (en) Device and method for providing response message to user input
US20200090656A1 (en) Sensor Based Semantic Object Generation
US20200005784A1 (en) Electronic device and operating method thereof for outputting response to user input, by using application
KR20180072534A (en) Electronic device and method for providing image associated with text
KR20190096752A (en) Method and electronic device for generating text comment for content
US10606886B2 (en) Method and system for remote management of virtual message for a moving object
JP7316695B2 (en) Method and system for recommending location-based digital content
CN114930319A (en) Music recommendation method and device
KR20200084428A (en) Method for generating video and device thereof
CN117688159A (en) Information searching method, device, computer equipment and storage medium
JP7090779B2 (en) Information processing equipment, information processing methods and information processing systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination