CN114464178A

CN114464178A - Evaluation data acquisition method, device and system, electronic equipment and storage medium

Info

Publication number: CN114464178A
Application number: CN202210102707.8A
Authority: CN
Inventors: 肖哲珊; 刘旭; 王芬
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-05-10

Abstract

The disclosure provides an evaluation data acquisition method, an evaluation data acquisition device, an evaluation data acquisition system, electronic equipment and a storage medium, and relates to the field of artificial intelligence such as intelligent voice, computer vision and natural language processing. The method comprises the following steps: injecting the acquired navigation track data into a mobile terminal, and reproducing a navigation scene corresponding to the navigation track data in the mobile terminal by using a map navigation product in the mobile terminal; playing the acquired user voice, and performing voice interaction with the map navigation product, wherein the user voice is the user voice in the voice interaction process corresponding to the navigation scene; and acquiring voice interaction information returned by the mobile terminal, and determining the evaluation data according to the voice interaction information. By applying the scheme disclosed by the disclosure, the labor and time cost can be saved, and the processing efficiency and the like can be improved.

Description

Evaluation data acquisition method, device and system, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a system, an electronic device, and a storage medium for obtaining evaluation data in the fields of intelligent speech, computer vision, and natural language processing.

Background

The voice interaction is an important function in a map navigation product, and can enable a driver to conveniently acquire information and guidance and the like which the driver wants under the condition that the sight is concentrated on a road. The voice interaction is full of diversity, and the evaluation on the voice interaction effect is an important direction for optimizing and iterating the navigation voice interaction product.

In order to evaluate the effect, evaluation data needs to be obtained first. Currently, the following methods are generally used to obtain evaluation data: and selecting a navigation planning route of a specific scene to carry out on-site road test, playing specific user voice to carry out voice interaction with a map navigation product, and acquiring required evaluation data by means of manual operation. However, since the actual drive test is required and the manual operation is required, a large labor and time cost is required and the efficiency is low.

Disclosure of Invention

The disclosure provides an evaluation data acquisition method, an evaluation data acquisition device, an evaluation data acquisition system, electronic equipment and a storage medium.

An evaluation data acquisition method includes:

injecting the acquired navigation track data into a mobile terminal, and reproducing a navigation scene corresponding to the navigation track data in the mobile terminal by using a map navigation product in the mobile terminal;

playing the acquired user voice, and performing voice interaction with the map navigation product, wherein the user voice is the user voice in the voice interaction process corresponding to the navigation scene;

and acquiring voice interaction information returned by the mobile terminal, and determining the evaluation data according to the voice interaction information.

An evaluation data acquisition method includes:

acquiring navigation track data injected by a control center, and reproducing a navigation scene corresponding to the navigation track data by using a map navigation product;

performing voice interaction with the voice of the user played by the control center by using the map navigation product, wherein the voice of the user is the voice of the user in the voice interaction process corresponding to the navigation scene;

and returning voice interaction information to the control center, so that the control center can determine the evaluation data according to the voice interaction information.

An evaluation data acquisition apparatus comprising: the system comprises a first reproduction module, a first interaction module and a first information processing module;

the first reproduction module is used for injecting the acquired navigation track data into the mobile terminal, and reproducing a navigation scene corresponding to the navigation track data in the mobile terminal by using a map navigation product in the mobile terminal;

the first interaction module is used for playing the acquired user voice and carrying out voice interaction with the map navigation product, wherein the user voice is the user voice in the voice interaction process corresponding to the navigation scene;

the first information processing module is used for acquiring voice interaction information returned by the mobile terminal and determining the evaluation data according to the voice interaction information.

An evaluation data acquisition apparatus comprising: the second reproduction module, the second interaction module and the second information processing module;

the second reproduction module is used for acquiring navigation track data injected by the control center and reproducing a navigation scene corresponding to the navigation track data by using a map navigation product;

the second interaction module is used for performing voice interaction with the user voice played by the control center by using the map navigation product, wherein the user voice is the user voice in the voice interaction process corresponding to the navigation scene;

and the second information processing module is used for returning voice interaction information to the control center and determining the evaluation data according to the voice interaction information by the control center.

An evaluation data acquisition system comprising: two evaluation data acquisition devices as described above.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.

A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described above.

A computer program product comprising computer programs/instructions which, when executed by a processor, implement a method as described above.

One embodiment in the above disclosure has the following advantages or benefits: the method can automatically realize the recurrence of the navigation scene, can simulate the voice interaction process with a map navigation product, can automatically acquire the required voice interaction information, and can further determine the required evaluation data according to the voice interaction information, thereby saving the labor and time cost and improving the processing efficiency and the like compared with a field road test mode.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a first embodiment of a method for obtaining evaluation data according to the present disclosure;

FIG. 2 is a schematic illustration of an image resulting from a screenshot described in this disclosure;

FIG. 3 is a flowchart of a second embodiment of the evaluation data acquisition method according to the present disclosure;

fig. 4 is a schematic structural diagram illustrating a first embodiment 400 of an evaluation data obtaining apparatus according to the present disclosure;

fig. 5 is a schematic structural diagram illustrating a second embodiment 500 of the evaluation data obtaining apparatus according to the present disclosure;

FIG. 6 is a schematic diagram of a schematic structural diagram of an embodiment 600 of the evaluation data acquisition system according to the present disclosure;

FIG. 7 illustrates a schematic block diagram of an electronic device 700 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is only one kind of association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a flowchart of a first embodiment of an evaluation data acquisition method according to the present disclosure. As shown in fig. 1, the following detailed implementation is included.

In step 101, the acquired navigation track data is injected into the mobile terminal, and a navigation scene corresponding to the navigation track data is reproduced in the mobile terminal by using a map navigation product in the mobile terminal.

In step 102, the acquired user voice is played and is subjected to voice interaction with a map navigation product, and the user voice is the user voice in the voice interaction process corresponding to the navigation scene.

In step 103, voice interaction information returned by the mobile terminal is obtained, and evaluation data is determined according to the voice interaction information.

It can be seen that, in the scheme of the embodiment of the method, the recurrence of the navigation scene can be automatically realized, and the voice interaction process with the map navigation product can be simulated, so that the required voice interaction information can be automatically obtained, and further the required evaluation data can be determined according to the voice interaction information.

The executing body of the embodiment shown in fig. 1 may be a control center, and the mobile terminal may be a mobile phone.

In addition, massive navigation track data and corresponding user voice which are actually collected by online users can be randomly sampled, one piece of navigation track data and corresponding user voice can be extracted every time, and the user voice is the user voice in the voice interaction process corresponding to the navigation scene corresponding to the navigation track data. The navigation track data may include specific route information, and the like, and the user voice refers to voice uttered by the user during the voice interaction process.

By the method, the acquired navigation track data and the user voice can be close to the user data distribution and the user real requirements of the real world.

The acquired navigation track data can be injected into the mobile terminal, and a navigation scene corresponding to the navigation track data is reproduced in the mobile terminal by using a map navigation product in the mobile terminal.

In one embodiment of the disclosure, the navigation track data can be injected into the mobile terminal according to a predetermined file format, and a map navigation product in the mobile terminal can be started to monitor the navigation track data, so that the reproduction of the navigation scene is realized.

The specific format of the predetermined file format is not limited. For example, an xposed framework technology can be adopted to inject the navigation track data into the mobile terminal according to a predetermined file format, the xposed framework is an open-source framework service which operates in an Android high-authority mode, a map navigation product in the mobile terminal can be started, and the map navigation product monitors the injected navigation track data through a note counting program in the mobile terminal, so that the reproduction of a navigation scene is realized.

How to start the map navigation product in the mobile terminal is not limited, for example, the map navigation product in the mobile terminal can be called up by interacting with the mobile terminal in an Android Debug Bridge (ADB, Android Debug Bridge) mode.

By the method, the reproduction of the navigation scene can be accurately and efficiently realized, so that a good foundation is laid for subsequent processing.

In one embodiment of the present disclosure, before playing the voice of the user, the voice assistant in the map navigation product may also be awakened by playing the wake-up word. The specific content of the wake-up word is not limiting.

For example, the wake word audio may be generated by a standardized machine-synthesized voice manner, and may be played through a connected stereo or the like using python game programming (pygame) technology, so as to wake up the voice assistant in the map navigation product. python is a computer programming language that provides efficient high-level data structures and enables simple and efficient object-oriented programming.

The voice assistant is a function implemented in the map navigation product, and in one embodiment of the disclosure, voice interaction can be performed with the map navigation product through the voice assistant. For the user voice, the voice assistant can generate corresponding response voice by combining the navigation scene, the voice content of the user voice and the like.

It can be seen that through the wake-up operation, the voice interaction function of the map navigation product can be started, i.e. the voice interaction process with the user is started.

Correspondingly, after the voice assistant is awakened, the acquired user voice can be played, so that voice interaction is performed with the voice assistant. Generally speaking, the user voice needs to be played according to the corresponding relation/synchronous relation between the user voice and the navigation track data, that is, the real scene during real acquisition is synchronously reproduced.

Wherein the user's voice can be played through a connected stereo or the like using pygame technology.

In an embodiment of the disclosure, after the voice assistant is awakened, the screen capture and recording functions of the mobile terminal can be started, and after the voice of the user is played, the screen capture and recording functions of the mobile terminal can be closed, and accordingly, voice interaction information returned by the mobile terminal, that is, an image and screen recording information returned by the mobile terminal can be obtained, where the image is obtained by capturing screen display content of the mobile terminal at a predetermined frame rate within a predetermined time period, the predetermined time period is a time period from the starting of the screen capture and recording functions of the mobile terminal to the closing of the screen capture and recording functions of the mobile terminal, and the screen recording information is obtained by recording the screen of the mobile terminal within the predetermined time period.

After the voice assistant is awakened, the screen capture and recording functions of the mobile terminal can be started until the voice interaction process is finished, the screen capture and recording functions are closed, accordingly, complete voice interaction information of the whole voice interaction process can be obtained, and therefore accuracy of subsequently obtained evaluation data is improved.

In an embodiment of the present disclosure, the image may be an image obtained by capturing a screenshot of a screen presentation content in a socket transmission-based object-oriented programming language distributed theorem (javacap) manner.

In the voice interaction process of the map navigation product, the information displayed on a User Interface (UI) layer disappears very fast, so that the screenshot needs to reach a very high frame rate.

In addition, in an embodiment of the present disclosure, the screen recording function of the mobile terminal may be turned on in an ADB manner, and the screen recording function of the mobile terminal may be turned off in the ADB manner.

Because screen recording and screen capturing are carried out synchronously, the screen recording process cannot conflict with socket transmission, and for this reason, the proposal of the disclosure can adopt an ADB mode to open and close the screen recording function of the mobile terminal, for example, screen recording software in the mobile terminal can be opened and closed by simulating click action. The screen recording software can be screen recording software with a recording function, and the screen recording duration is not limited.

How to turn on and off the screenshot function of the mobile terminal is not limited, and for example, an ADB method or other feasible methods may be adopted.

After the voice interaction information returned by the mobile terminal is acquired, namely the image and screen recording information returned by the mobile terminal are acquired, the required evaluation data can be further determined.

In an embodiment of the disclosure, text recognition may be performed on the acquired image, and text information and screen recording information obtained by recognition may be used as evaluation data.

For example, an Optical Character Recognition (OCR) method such as a flying OCR (paddleocr) method may be used to perform text Recognition on the image, so as to ensure the accuracy of the Recognition result.

Through text recognition, the text information of the image can be acquired, and further the text of the voice interaction process is realized.

In an embodiment of the present disclosure, when performing text recognition on an acquired image, for any image, the following processing may be further performed: and intercepting a subimage corresponding to the voice interaction display area from the image, and performing text recognition on the subimage.

The base map in the acquired image is map information of the current environment, and the base map also contains a large amount of text information, so that the recognition workload can be increased if the whole image is subjected to text recognition, and dirty data can be introduced. The position of the voice interaction display area can be determined according to the existing mode, and the corresponding sub-image is correspondingly intercepted.

Fig. 2 is a schematic diagram of an image obtained from screenshot according to the present disclosure. As shown in fig. 2, a sub-image shown by a gray rectangular box may be truncated therefrom and text recognition may be performed on the sub-image.

The text information and the screen recording information obtained by recognition can be used as evaluation data, then the navigation track data and the corresponding user voice can be continuously extracted, and the evaluation data can be obtained according to the method disclosed by the invention, so that a plurality of evaluation data can be accumulated, and abundant data support is provided for the voice interaction function iteration of the map navigation product. How to evaluate the voice interaction effect of a map navigation product by using evaluation data is the prior art.

Fig. 3 is a flowchart of a second embodiment of the evaluation data acquisition method according to the present disclosure. As shown in fig. 3, the following detailed implementation is included.

In step 301, navigation track data injected by a control center is obtained, and a navigation scene corresponding to the navigation track data is reproduced by using a map navigation product.

In step 302, performing voice interaction with a user voice played by a control center by using a map navigation product, where the user voice is a user voice in a voice interaction process corresponding to the navigation scene.

In step 303, voice interaction information is returned to the control center, and the control center determines evaluation data according to the voice interaction information.

According to the scheme of the embodiment of the method, the recurrence of the navigation scene can be automatically realized, the voice interaction process with a map navigation product can be simulated, the required voice interaction information can be automatically acquired, the required evaluation data can be determined according to the voice interaction information, compared with a field road test mode, the labor and time cost is saved, the processing efficiency is improved, and the like.

The execution subject of the embodiment shown in fig. 3 may be a mobile terminal, such as a mobile phone.

In one embodiment of the disclosure, in response to the screen capture and recording function which is started by the control center before the user voice is played, the screen capture can be performed on the screen display content at a preset frame rate, and the screen can be recorded.

That is, the voice interaction information returned to the control center may include image and screen recording information. Through the processing, complete voice interaction information of the whole voice interaction process can be obtained, and therefore accuracy of subsequently determined evaluation data is improved.

It is noted that while for simplicity of explanation, the foregoing method embodiments are described as a series of acts, those skilled in the art will appreciate that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required for the disclosure. In addition, for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions in other embodiments.

In a word, by adopting the scheme of the embodiment of the method, the full-process automation can be realized, the reappearance of the online real user navigation scene and the voice interaction process can be realized, a large amount of evaluation data can be obtained in an accumulated mode, the labor and time cost is saved, the processing efficiency is improved, and rich data support can be provided for the voice interaction function iteration of the map navigation product.

The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below.

Fig. 4 is a schematic structural diagram illustrating a first embodiment 400 of an evaluation data obtaining apparatus according to the present disclosure. As shown in fig. 4, includes: a first reproduction module 401, a first interaction module 402, and a first information processing module 403.

The first reproduction module 401 is configured to inject the acquired navigation track data into the mobile terminal, and reproduce a navigation scene corresponding to the navigation track data in the mobile terminal by using a map navigation product in the mobile terminal.

The first interaction module 402 is configured to play the acquired user voice, and perform voice interaction with the map navigation product, where the user voice is the user voice in the voice interaction process corresponding to the navigation scene.

The first information processing module 403 is configured to obtain voice interaction information returned by the mobile terminal, and determine evaluation data according to the voice interaction information.

According to the technical scheme, the navigation scene can be automatically reproduced, the voice interaction process with a map navigation product can be simulated, the required voice interaction information can be automatically acquired, the required evaluation data can be determined according to the voice interaction information, labor and time costs are saved compared with a field road test mode, and processing efficiency is improved.

The navigation track data and the corresponding user voice can be the navigation track data and the corresponding user voice which are actually collected by the online user.

In an embodiment of the present disclosure, the first reproduction module 401 may inject the navigation track data into the mobile terminal according to a predetermined file format, and may start a map navigation product in the mobile terminal to monitor the navigation track data, so as to realize reproduction of the navigation scene.

The specific format of the predetermined file format is not limited. For example, the xposed framework technology can be adopted to inject the navigation track data into the mobile terminal according to a predetermined file format, and a map navigation product in the mobile terminal can be started, so that the map navigation product monitors the injected navigation track data through a note counting program in the mobile terminal, thereby realizing the reproduction of the navigation scene.

In an embodiment of the disclosure, before the first interaction module 402 plays the voice of the user, the voice assistant in the map navigation product may also be awakened by playing the wake-up word, and then, the first interaction module 402 may play the acquired voice of the user to perform voice interaction with the map navigation product, specifically, may perform voice interaction with the map navigation product by using the voice assistant.

In an embodiment of the present disclosure, after waking up the voice assistant, the first interaction module 402 may further start a screen capture and recording function of the mobile terminal, and after the user finishes playing the voice, the screen capture and recording function of the mobile terminal may be closed, and accordingly, the first information processing module 403 may obtain the voice interaction information returned by the mobile terminal, that is, obtain an image and screen recording information returned by the mobile terminal, where the image is obtained by capturing screen display content at a predetermined frame rate within a predetermined time period by the mobile terminal, the predetermined time period is a time period from the start of the screen capture and recording function of the mobile terminal to the close of the screen capture and recording function of the mobile terminal, and the screen recording information is obtained by recording a screen by the mobile terminal within the predetermined time period.

In an embodiment of the disclosure, the image may be an image obtained by screenshot of the screen display content in a javacap mode based on socket transmission.

In addition, in an embodiment of the present disclosure, the first interaction module 402 may turn on a screen recording function of the mobile terminal in an ADB manner, and turn off the screen recording function of the mobile terminal in the ADB manner.

After the first information processing module 403 acquires the voice interaction information returned by the mobile terminal, it may determine evaluation data according to the voice interaction information.

In an embodiment of the present disclosure, the first information processing module 403 may perform text recognition on the acquired image, and use the recognized text information and screen recording information as evaluation data.

In an embodiment of the present disclosure, the first information processing module 403 may further perform the following processing for any image: and intercepting a sub-image corresponding to the voice interaction display area from the image, and performing text recognition on the sub-image.

Fig. 5 is a schematic structural diagram illustrating a second embodiment 500 of the evaluation data obtaining apparatus according to the present disclosure. As shown in fig. 5, includes: a second reproduction module 501, a second interaction module 502 and a second information processing module 503.

The second recurrence module 501 is configured to obtain navigation track data injected by the control center, and utilize a map navigation product to recur a navigation scene corresponding to the navigation track data.

The second interaction module 502 is configured to perform voice interaction with a user voice played by the control center by using a map navigation product, where the user voice is a user voice in a voice interaction process corresponding to the navigation scene.

The second information processing module 503 is configured to return voice interaction information to the control center, and is configured to determine evaluation data according to the voice interaction information by the control center.

In an embodiment of the present disclosure, the second information processing module 503 may capture a screen of the screen display content at a predetermined frame rate in response to the screen capture and screen recording function that is turned on by the control center before the user voice is played, and record a screen of the screen, and may stop capturing a screen and recording a screen in response to the screen capture and screen recording function that is turned off by the control center after the user voice is played, and may return the image and screen recording information obtained by capturing a screen to the control center.

Fig. 6 is a schematic diagram of a composition structure of an embodiment 600 of the evaluation data acquisition system according to the present disclosure. As shown in fig. 6, includes: a first evaluation data acquisition device 601 and a second evaluation data acquisition device 602.

The first evaluation data obtaining device 601 may be the evaluation data obtaining device in the embodiment shown in fig. 4, and the second evaluation data obtaining device 602 may be the evaluation data obtaining device in the embodiment shown in fig. 5.

The specific work flow of the above device and system embodiments may refer to the related description in the foregoing method embodiments, and is not repeated.

In a word, by adopting the scheme of the embodiment of the device and the system, the automation of the whole process can be realized, the reappearance of the navigation scene and the voice interaction process of the real user on line can be realized, a large amount of evaluation data can be accumulated, the labor and time cost is saved, the processing efficiency is improved, and rich data support can be provided for the voice interaction function iteration of the map navigation product.

The scheme disclosed by the disclosure can be applied to the field of artificial intelligence, in particular to the fields of intelligent voice, computer vision, natural language processing and the like. Artificial intelligence is a subject for studying a computer to simulate some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a human, and has a hardware technology and a software technology, the artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.

In the embodiment of the disclosure, the navigation track data and the voice are not specific to a certain user and cannot reflect personal information of the certain user, and in addition, the execution main body of the evaluation data acquisition method can acquire the navigation track data and the voice in various public and legal compliance modes, such as acquiring from the user after authorization of the user.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as the methods described in this disclosure. For example, in some embodiments, the methods described in this disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into RAM 703 and executed by the computing unit 701, one or more steps of the methods described in the present disclosure may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g., by means of firmware) to perform the methods described in the present disclosure.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An evaluation data acquisition method includes:

2. The method of claim 1, wherein the reproducing, in the mobile terminal, the navigation scene corresponding to the navigation trajectory data comprises:

and injecting the navigation track data into the mobile terminal according to a preset file format, and starting the map navigation product in the mobile terminal to monitor the navigation track data so as to realize reproduction of the navigation scene.

3. The method of claim 1 or 2, further comprising:

before the voice of the user is played, a voice assistant in the map navigation product is awakened by playing an awakening word;

wherein the voice interacting with the map navigation product comprises: and performing voice interaction with the map navigation product through the voice assistant.

4. The method of claim 3, further comprising:

after the voice assistant is awakened, starting the screen capture and recording functions of the mobile terminal, and after the voice of the user is played, closing the screen capture and recording functions of the mobile terminal;

wherein, the acquiring the voice interaction information returned by the mobile terminal comprises:

acquiring an image and screen recording information returned by the mobile terminal;

the image is obtained by the mobile terminal performing screenshot on the screen display content within a preset time period at a preset frame rate, wherein the preset time period is a time period from the starting of the screenshot and screen recording functions of the mobile terminal to the closing of the screenshot and screen recording functions of the mobile terminal;

and the screen recording information is obtained by recording the screen by the mobile terminal in the preset time period.

5. The method of claim 4, wherein,

the image includes: and (3) adopting a distributed theorem javacap mode of an object-oriented programming language based on socket transmission to capture the screen display content to obtain an image.

6. The method of claim 4, wherein,

the screen recording function of the mobile terminal is started, and the screen recording function comprises the following steps: starting a screen recording function of the mobile terminal in an android debugging bridge mode;

the screen recording function of the mobile terminal is closed, and the screen recording function comprises the following steps: and closing the screen recording function of the mobile terminal in an android debugging bridge mode.

7. The method according to claim 4, wherein the determining the profile from the voice interaction information comprises:

and performing text recognition on the image, and taking the recognized text information and the screen recording information as the evaluation data.

8. The method of claim 7, wherein the text recognizing the image comprises:

for any image, the following processing is respectively carried out: and intercepting a subimage corresponding to the voice interaction display area from the image, and performing text recognition on the subimage.

9. An evaluation data acquisition method includes:

10. The method of claim 9, wherein the returning voice interaction information to the control center comprises:

responding to the screen capture and screen recording functions started by the control center before the user voice is played, capturing screen capture of screen display contents at a preset frame rate, and recording the screen;

and stopping the screen capture and recording in response to the screen capture and recording functions which are closed by the control center after the user voice is played, and returning the image and the screen recording information obtained by the screen capture to the control center.

11. An evaluation data acquisition apparatus comprising: the system comprises a first reproduction module, a first interaction module and a first information processing module;

the first interaction module is used for playing the acquired user voice and performing voice interaction with the map navigation product, wherein the user voice is the user voice in the voice interaction process corresponding to the navigation scene;

12. The apparatus of claim 11, wherein,

and the first reproduction module injects the navigation track data into the mobile terminal according to a preset file format, and starts the map navigation product in the mobile terminal to monitor the navigation track data, so as to realize reproduction of the navigation scene.

13. The apparatus of claim 11 or 12,

the first interaction module is further used for awakening a voice assistant in the map navigation product by playing an awakening word before the voice of the user is played, and performing voice interaction with the map navigation product by the voice assistant.

14. The apparatus of claim 13, wherein,

the first interaction module is further used for starting the screen capture and recording functions of the mobile terminal after the voice assistant is awakened, and closing the screen capture and recording functions of the mobile terminal after the user voice is played;

the first information processing module acquires images and screen recording information returned by the mobile terminal; the image is obtained by the mobile terminal performing screenshot on the screen display content within a preset time period at a preset frame rate, wherein the preset time period is a time period from the starting of the screenshot and screen recording functions of the mobile terminal to the closing of the screenshot and screen recording functions of the mobile terminal; and the screen recording information is obtained by recording the screen by the mobile terminal in the preset time period.

15. The apparatus of claim 14, wherein,

16. The apparatus of claim 14, wherein,

the first interaction module starts a screen recording function of the mobile terminal in an android debugging bridge mode, and closes the screen recording function of the mobile terminal in the android debugging bridge mode.

17. The apparatus of claim 14, wherein,

and the first information processing module performs text recognition on the image, and takes the recognized text information and the screen recording information as the evaluation data.

18. The apparatus of claim 17, wherein,

the first information processing module respectively performs the following processing for any image: and intercepting a subimage corresponding to the voice interaction display area from the image, and performing text recognition on the subimage.

19. An evaluation data acquisition apparatus comprising: the second reproduction module, the second interaction module and the second information processing module;

20. The apparatus of claim 19, wherein,

and the second information processing module responds to the screen capture and recording functions started by the control center before the user voice is played, captures the screen display content at a preset frame rate, records the screen, stops capturing and recording the screen in response to the screen capture and recording functions closed by the control center after the user voice is played, and returns the image and screen recording information obtained by capturing the screen to the control center.

21. An evaluation data acquisition system comprising: the apparatus of any one of claims 11-18, and the apparatus of any one of claims 19-20.

22. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

23. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-10.

24. A computer program product comprising a computer program/instructions which, when executed by a processor, implement the method of any one of claims 1-10.