WO2016095380A1

WO2016095380A1 - Instant messaging method and device

Info

Publication number: WO2016095380A1
Application number: PCT/CN2015/076647
Authority: WO
Inventors: 艾超
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-12-18
Filing date: 2015-04-15
Publication date: 2016-06-23
Also published as: CN105763828A

Abstract

Disclosed are an instant messaging method and a device, wherein the method includes: obtaining voice information sent, by a first user, to a second user; generating a picture according to a predetermined first user head model of the first user; integrating the picture and the voice information to obtain a video stream including the picture and voice information; and displaying the video stream to the second user. The technical solution reduces the amount of the network data during a video communication process, and reduces the network data transmission pressure, and when the bandwidth condition of the user's network is poor, the technical solution can also perform video instant messaging, thus satisfying video communication requirement of the user.

Description

一种即时通讯方法及装置Instant communication method and device

技术领域Technical field

本文涉及即时通讯技术领域，具体涉及一种即时通讯方法及装置。This paper relates to the field of instant messaging technology, and in particular relates to an instant messaging method and device.

背景技术Background technique

即时通讯满足了人们对信息交互的需求，已经得到广泛的应用。随着人们对信息沟通的需求的发展，人们已经不再满足于简单的文本信息交互，更希望在即时通讯过程中能够直观的看到对方，即希望通过视频形式的即时通讯，来更好的进行交流。Instant messaging meets the needs of people for information interaction and has been widely used. With the development of people's demand for information communication, people are no longer satisfied with simple text information interaction, and they hope to see each other intuitively in the process of instant messaging, that is, they want better instant communication through video. Communicate.

相关技术已出现了很多支持视频通讯的即时通讯服务和应用程序。这些应用程序和服务在具有良好网络带宽的网络环境下，能够大大增强即时通讯的现场感，极大的满足了人们的需求。然而，相关技术在实现视频通讯时，需要终端处装备有可以采用用户图像的摄像头，并将采集到的图像数据上传到服务器。因此，若用户终端处没有摄像头，则无法进行视频通讯。另外，视频图像数据通常数据量较大，对网络传输带宽要求较高，在端对端带宽无法保证时，视频通讯的图像和声音经常会出现卡顿，这将严重影响视频通讯的使用体验。Related technologies There have been many instant messaging services and applications that support video communication. These applications and services can greatly enhance the sense of presence of instant messaging in a network environment with good network bandwidth, which greatly meets people's needs. However, in the related art, when implementing video communication, the terminal is required to be equipped with a camera that can adopt the user image, and upload the collected image data to the server. Therefore, if there is no camera at the user terminal, video communication is not possible. In addition, video image data usually has a large amount of data, and requires high bandwidth for network transmission. When end-to-end bandwidth cannot be guaranteed, the image and sound of video communication often appear to be stuck, which will seriously affect the experience of video communication.

发明内容Summary of the invention

本发明实施例提供一种即时通讯方法及装置，用以解决降低视频通讯的要求，以更好的满足用户视频通讯的需求的技术问题。The embodiment of the invention provides an instant messaging method and device for solving the technical problem of reducing the requirement of video communication to better meet the requirements of video communication of the user.

为解决上述技术问题，本发明实施例提供的即时通讯方法，包括：To solve the above technical problem, the instant messaging method provided by the embodiment of the present invention includes:

获取第一用户向第二用户发送的语音信息；Obtaining voice information sent by the first user to the second user;

根据预先确定第一用户的第一用户头像模型，生成一图像画面；Generating an image image according to the first user avatar model of the first user being determined in advance;

将所述图像画面与所述语音信息进行整合，得到一包含有所述图像画面和语音信息的视频流；Integrating the image picture with the voice information to obtain a video stream including the image picture and voice information;

将所述视频流展示给所述第二用户。 Presenting the video stream to the second user.

其中，上述方法中，所述获取第一用户向第二用户发送的语音信息，包括：In the above method, the acquiring the voice information sent by the first user to the second user includes:

接收第一用户向第二用户发送的语音信息，或，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。Receiving the voice information sent by the first user to the second user, or receiving the text information sent by the first user to the second user, and obtaining the voice information corresponding to the text information by text-to-speech conversion.

检测第一用户的上行传输带宽；Detecting an uplink transmission bandwidth of the first user;

在所述上行传输带宽小于预设第一门限时，提示第一用户仅发送文本信息或语音信号；When the uplink transmission bandwidth is less than a preset first threshold, prompting the first user to send only text information or a voice signal;

其中，上述方法中，所述方法还包括：在所述获取第一用户向第二用户发送的语音信息之前，In the above method, the method further includes: before acquiring the voice information sent by the first user to the second user,

获取第一用户的用户头像和用户特征信息，所述用户特征信息至少包括第一用户的性别和年龄；Obtaining a user avatar and user feature information of the first user, where the user feature information includes at least a gender and an age of the first user;

从预先建立的***用户模型中，确定与第一用户的用户特征信息相对应的用户模型；Determining, from a pre-established system user model, a user model corresponding to user characteristic information of the first user;

从所述用户头像中提取第一用户的面部皮肤纹理，将所述面部皮肤纹理与所述第一用户对应的用户模型相绑定，得到所述第一用户头像模型。Extracting a facial skin texture of the first user from the user avatar, and binding the facial skin texture to a user model corresponding to the first user to obtain the first user avatar model.

其中，上述方法中，所述根据预先确定第一用户的第一用户头像模型，生成一图像画面，包括：In the above method, the generating an image image according to the first user avatar model of the first user is determined, including:

解析所述语音信息，确定所述语音信息对应的面部表情；Parsing the voice information to determine a facial expression corresponding to the voice information;

根据所确定的面部表情，控制所述第一用户头像模型生成与所述面部表情相对应的面部动作，得到所述图像画面。And controlling the first user avatar model to generate a facial motion corresponding to the facial expression according to the determined facial expression to obtain the image screen.

其中，上述方法中，所述将所述视频流展示给所述第二用户，包括：In the above method, the displaying the video stream to the second user includes:

将所述视频流发送至所述第二用户对应的终端，以通过该终端播放所述视频流。And sending the video stream to a terminal corresponding to the second user to play the video stream through the terminal.

本发明实施例还提供了一种即时通讯装置，包括： The embodiment of the invention further provides an instant messaging device, comprising:

第一获取单元，设置为获取第一用户向第二用户发送的语音信息；a first acquiring unit, configured to acquire voice information sent by the first user to the second user;

生成单元，设置为根据预先确定第一用户的第一用户头像模型，生成一图像画面；a generating unit, configured to generate an image image according to the first user avatar model of the first user being determined in advance;

整合单元，设置为将所述图像画面与所述语音信息进行整合，得到一包含有所述图像画面和语音信息的视频流；An integration unit configured to integrate the image frame with the voice information to obtain a video stream including the image frame and voice information;

展示单元，设置为将所述视频流显示给所述第二用户。a display unit configured to display the video stream to the second user.

其中，上述装置中，所述第一获取单元，是设置为接收第一用户向第二用户发送的语音信号，得到所述语音信息；或者，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。The first acquiring unit is configured to receive a voice signal sent by the first user to the second user to obtain the voice information, or receive text information sent by the first user to the second user. The voice information corresponding to the text information is obtained by text-to-speech conversion.

其中，上述装置中，所述第一获取单元包括：In the above device, the first acquiring unit includes:

检测单元，设置为检测第一用户的上行传输带宽；a detecting unit, configured to detect an uplink transmission bandwidth of the first user;

提示单元，设置为在所述上行传输带宽小于预设第一门限时，提示第一用户仅发送文本信息或语音信号；The prompting unit is configured to prompt the first user to only send text information or a voice signal when the uplink transmission bandwidth is less than a preset first threshold;

接收单元，设置为接收第一用户向第二用户发送的语音信息，或，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。The receiving unit is configured to receive the voice information sent by the first user to the second user, or receive the text information sent by the first user to the second user, and obtain the voice information corresponding to the text information by text-to-speech conversion.

其中，上述装置还包括：Wherein, the above device further comprises:

第二获取单元，设置为获取第一用户的用户头像和用户特征信息，所述用户特征信息至少包括第一用户的性别和年龄；a second acquiring unit, configured to acquire a user avatar and user feature information of the first user, where the user feature information includes at least a gender and an age of the first user;

确定单元，设置为从预先建立的***用户模型中，确定与第一用户的用户特征信息相对应的用户模型；a determining unit, configured to determine, from a pre-established system user model, a user model corresponding to user characteristic information of the first user;

绑定单元，设置为从所述用户头像中提取第一用户的面部皮肤纹理，将所述面部皮肤纹理与所述第一用户对应的用户模型相绑定，得到所述第一用户头像模型。The binding unit is configured to extract a facial skin texture of the first user from the user avatar, and bind the facial skin texture to a user model corresponding to the first user to obtain the first user avatar model.

其中，上述装置中，所述生成单元包括：Wherein, in the above device, the generating unit comprises:

解析单元，设置为解析所述语音信息，确定所述语音信息对应的面部表情；a parsing unit configured to parse the voice information to determine a facial expression corresponding to the voice information;

控制处理单元，设置为根据所确定的面部表情，控制所述第一用户头像模型生成与所述面部表情相对应的面部动作，得到所述图像画面。 The control processing unit is configured to control the first user avatar model to generate a facial motion corresponding to the facial expression according to the determined facial expression to obtain the image screen.

其中，上述装置中，所述展示单元，是设置为将所述视频流发送至所述第二用户对应的终端，以通过该终端播放所述视频流。In the above device, the display unit is configured to send the video stream to a terminal corresponding to the second user, to play the video stream through the terminal.

本发明实施例还提供了一种计算机存储介质，所述计算机存储介质中存储有计算机可执行指令，所述计算机可执行指令用于上述的方法。The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used in the above method.

与相关技术相比，本发明实施例提供的即时通讯方法及装置，至少具有以下有益效果：第一用户不需要实时采集和传输其本地图像数据，因此第一用户不需要在本地设置摄像头，降低了终端的设备成本，同时还减少了视频通讯过程中网络数据量，降低了网络数据传输压力。并且，由于第一用户仅上传其欲发送的文本信息/语音信息，因此对该第一用户的网络接入带宽要求较低，即使在其网络带宽条件较差时，也能够在第二用户处播放第一用户的图像画面，满足用户的视频通讯需求。Compared with the related art, the instant messaging method and apparatus provided by the embodiments of the present invention have at least the following beneficial effects: the first user does not need to collect and transmit local image data in real time, so the first user does not need to set the camera locally, thereby reducing The equipment cost of the terminal also reduces the amount of network data in the video communication process and reduces the pressure of network data transmission. Moreover, since the first user only uploads the text information/voice information that he wants to send, the network access bandwidth requirement of the first user is low, and even when the network bandwidth condition is poor, the second user can be at the second user. Play the image screen of the first user to meet the video communication needs of the user.

附图概述BRIEF abstract

图1为本发明实施例提供的即时通讯方法的流程示意图；1 is a schematic flowchart of an instant messaging method according to an embodiment of the present invention;

图2为本发明实施例提供的即时通讯装置的结构示意图。2 is a schematic structural diagram of an instant messaging device according to an embodiment of the present invention.

本发明的较佳实施方式Preferred embodiment of the invention

下面将结合附图及具体实施例进行详细描述。The detailed description will be made below in conjunction with the accompanying drawings and specific embodiments.

相关视频即时通讯需要终端处安装有摄像头，且对终端的网络接入带宽有一定的要求，如果终端没有安装摄像头或者终端的网络接入带宽无法保证时，会导致视频通讯不可用或者视频通讯效果很差，无法很好的满足人们对于视频通讯的需求。为解决上述问题，本发明实施例提出了一种即时通讯方法，预先建立用户的用户头像模型，在视频通讯时仅需要获取用户希望传送的文本信息或语音信息，基于文本信息/语音信息与用户头像模型，生成包含有图像画面和语音的视频流，实现了视频通讯的效果。下面将结合附图，通过具体实施例对本发明做进一步的说明。 Related video instant messaging requires a camera installed at the terminal, and has certain requirements for the network access bandwidth of the terminal. If the terminal does not have a camera installed or the network access bandwidth of the terminal cannot be guaranteed, the video communication may be unavailable or the video communication effect may be caused. Very poor, can not meet the needs of people for video communication. In order to solve the above problem, the embodiment of the present invention provides an instant messaging method, which pre-establishes a user avatar model of a user, and only needs to obtain text information or voice information that the user wishes to transmit during video communication, based on text information/voice information and users. The avatar model generates a video stream containing image images and voices to achieve video communication effects. The present invention will be further described by way of specific embodiments with reference to the accompanying drawings.

请参照图1，本发明实施例提供的一种即时通讯方法，可应用于一视频通讯服务器，如图1所示，该方法包括：Referring to FIG. 1 , an instant messaging method provided by an embodiment of the present invention can be applied to a video communication server. As shown in FIG. 1 , the method includes:

步骤11，获取第一用户向第二用户发送的语音信息。Step 11: Acquire voice information sent by the first user to the second user.

这里，第一用户在需要向第二用户进行即时通讯时，可以直接发送欲传送的语音信息或文本信息。对应的，视频服务器可以接收第一用户向第二用户发送的语音信息，或者，接收第一用户向第二用户发送的文本信息，并通过文本语音转换，得到所述文本信息对应的语音信息。Here, when the first user needs to perform instant messaging to the second user, the first user may directly send the voice information or the text information to be transmitted. Correspondingly, the video server may receive the voice information sent by the first user to the second user, or receive the text information sent by the first user to the second user, and obtain the voice information corresponding to the text information by text-to-speech conversion.

考虑到第一用户的上行传输带宽，若该带宽较小难以上传视频图像时，可以提示第一终端仅上传其所欲发送的文本信息/语音信息，即，本发明实施例可以预先检测第一用户的上行传输带宽，并在所述上行传输带宽小于预设第一门限(该门限可以根据视频流传递要求的下限进行设置)时，提示第一用户仅发送文本信息或语音信号。这样，第一用户根据该提示发送相应的信息，视频服务器即可接收第一用户向第二用户发送的语音信息，或，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。Considering the uplink transmission bandwidth of the first user, if the bandwidth is small and it is difficult to upload the video image, the first terminal may be prompted to upload only the text information/speech information that is to be sent, that is, the first embodiment of the present invention may detect the first The uplink transmission bandwidth of the user, and when the uplink transmission bandwidth is less than a preset first threshold (the threshold can be set according to the lower limit of the video stream transmission requirement), the first user is prompted to send only text information or a voice signal. In this way, the first user sends the corresponding information according to the prompt, and the video server can receive the voice information sent by the first user to the second user, or receive the text information sent by the first user to the second user, through text-to-speech conversion. Obtaining voice information corresponding to the text information.

步骤12，根据预先确定的第一用户的头像模型，生成一图像画面。Step 12: Generate an image frame according to the predetermined avatar model of the first user.

这里，可以控制第一用户头像模型按照预设方式运动，以获得一活动的图像画面。Here, the first user avatar model can be controlled to move in a preset manner to obtain an active image frame.

当然，还可以进一步结合语音信息，即同时根据所述语音信息和所述第一用户头像模型生成所述图像画面，可选的，可以通过解析所述语音信息，确定所述语音信息对应的面部表情(例如，通过解析根据语音信息，确定语音信息的语调语气，如疑问语调、普通的陈述语调等等，再确定与该语调语气对应的面部表情)；再根据所确定的面部表情，控制所述第一用户头像模型生成与所述面部表情相对应的面部动作，得到所述图像画面。Of course, the voice information may be further combined, that is, the image image is generated according to the voice information and the first user avatar model. Optionally, the voice information may be parsed to determine a face corresponding to the voice information. An expression (for example, by parsing according to the voice information, determining a tone of tone of the voice information, such as an interrogative tone, a general statement tone, etc., and then determining a facial expression corresponding to the tone of the tone); and then controlling the location according to the determined facial expression The first user avatar model generates a facial motion corresponding to the facial expression to obtain the image screen.

步骤13，将所述图像画面与所述语音信息进行整合，得到一包含有所述图像画面和语音信息的视频流。Step 13: Integrate the image picture with the voice information to obtain a video stream including the image picture and voice information.

这里，可以根据视频流的帧速率，将图像画面与语音信息进行整合，以获得合适大小的视频流。Here, the image picture and the voice information can be integrated according to the frame rate of the video stream to obtain a video stream of a suitable size.

步骤14，将所述视频流展示给所述第二用户。 Step 14. Display the video stream to the second user.

可选的，可以将所述视频流发送至所述第二用户对应的终端，以通过该终端播放所述视频流。Optionally, the video stream may be sent to a terminal corresponding to the second user to play the video stream by using the terminal.

以上步骤以视频服务器为例，说明了本发明实施例是如何实现视频通讯的。可以看出，上述通讯过程中第一用户不需要实时采集和传输其本地图像数据，因此不需要第一用户在本地设置摄像头，降低了终端的设备成本，同时还减少了视频通讯过程中网络数据量，降低了网络传输压力。另外，由于第一用户仅上传其欲发送的文本信息/语音信息，因此对该第一用户的网络接入带宽要求较低，即使在其网络带宽条件较差时，也能够在第二用户处播放第一用户的图像画面，满足用户的视频通讯需求。The above steps take the video server as an example to illustrate how the video communication is implemented in the embodiment of the present invention. It can be seen that the first user does not need to collect and transmit local image data in real time during the above communication process, so the first user is not required to set the camera locally, which reduces the equipment cost of the terminal, and also reduces the network data during the video communication process. Quantity, reducing network transmission pressure. In addition, since the first user only uploads the text information/voice information that he wants to send, the network access bandwidth requirement for the first user is low, and even when the network bandwidth condition is poor, the second user can be at the second user. Play the image screen of the first user to meet the video communication needs of the user.

另外，本发明实施例在上述步骤11之前，还可以按照以下方式，预先确定第一用户头像模型：In addition, before the foregoing step 11, the embodiment of the present invention may further determine the first user avatar model in the following manner:

步骤a，获取第一用户的用户头像和用户特征信息，所述用户特征信息至少包括第一用户的性别和年龄。Step a: Obtain a user avatar and user feature information of the first user, where the user feature information includes at least the gender and age of the first user.

例如，通过接收第一用户上传的头像，获得第一用户的用户头像，以及接收第一用户上传的用户资料，确定该第一用户的用户特征信息。For example, the user profile information of the first user is obtained by receiving the avatar uploaded by the first user, and the user profile uploaded by the first user is received, and the user feature information of the first user is determined.

步骤b，从预先建立的***用户模型中，确定与第一用户的用户特征信息相对应的用户模型。Step b: Determine a user model corresponding to the user feature information of the first user from the pre-established system user model.

本发明实施例中可以预先在***中建立并维护多个***用户模型，例如，针对不同性别、年龄、人种等特征，维护若干种典型的用户模型，用户模型通常是三维模型。通过第一用户的用户特征信息，从预先建模的多个***用户模型中确定与之对应的模型，作为该用户模型。In the embodiment of the present invention, a plurality of system user models may be established and maintained in the system in advance. For example, several typical user models are maintained for different genders, ages, ethnicities, and the like, and the user model is usually a three-dimensional model. A model corresponding to the user model of the first user is determined from the plurality of system user models that are pre-modeled as the user model.

步骤c，从所述用户头像中提取第一用户的面部皮肤纹理，将所述面部皮肤纹理与所述第一用户对应的用户模型相绑定，得到所述第一用户头像模型。In step c, the facial skin texture of the first user is extracted from the user avatar, and the facial skin texture is bound to the user model corresponding to the first user to obtain the first user avatar model.

这里，在步骤b所确定的用户模型基础上，绑定从用户头像中获得的面部皮肤纹理，具体可以通过纹理映射技术来实现，即，将纹理排列放到用户头像的三维模型的表面，获得与第一用户真人相接近的第一用户头像模型。Here, based on the user model determined in step b, binding the facial skin texture obtained from the user avatar may be implemented by a texture mapping technique, that is, placing the texture arrangement on the surface of the three-dimensional model of the user avatar to obtain A first user avatar model that is close to the first user.

本发明实施例图1所示的上述方法，还可以应用于第二用户对应的第二终端。此时，上述步骤11中，获取第一用户向第二用户发送的语音信息，可选的，可以是接收视频通讯服务器转发的所述第一用户向第二用户发送的语音信息。上述步骤14中，则可以直接播放所述视频流，以将所述视频流展示给所述第二用户。在由第二终端实现图1所示方法时，是由第二终端在本地生成并播放视频流，从而可以减轻服务器处的视频图像处理压力，同时还可以减少视频服务器需要传输给第二终端的视频数据量，降低网络的数据转发压力。The foregoing method shown in FIG. 1 of the embodiment of the present invention can also be applied to a second terminal corresponding to the second user. At this time, in the above step 11, the voice information sent by the first user to the second user is obtained, Alternatively, the voice information sent by the first user to the second user that is forwarded by the video communication server may be received. In the above step 14, the video stream may be directly played to display the video stream to the second user. When the method shown in FIG. 1 is implemented by the second terminal, the video stream is generated and played locally by the second terminal, so that the video image processing pressure at the server can be alleviated, and the video server needs to be transmitted to the second terminal. The amount of video data reduces the data forwarding pressure of the network.

基于以上所述的方法，本发明实施例还提供了一种即时通讯装置，用以实现上述方法。该装置可以应用于视频通讯服务器上或第二用户对应的第二终端上。请参照图2所示，该装置包括：Based on the above method, an embodiment of the present invention further provides an instant messaging device for implementing the foregoing method. The device can be applied to a video communication server or to a second terminal corresponding to the second user. Referring to FIG. 2, the device includes:

第一获取单元21，设置为获取第一用户向第二用户发送的语音信息；The first obtaining unit 21 is configured to acquire voice information sent by the first user to the second user.

生成单元22，设置为根据预先确定第一用户的第一用户头像模型，生成一图像画面；The generating unit 22 is configured to generate an image image according to the first user avatar model of the first user being determined in advance;

整合单元23，设置为将所述图像画面与所述语音信息进行整合，得到一包含有所述图像画面和语音信息的视频流；The integration unit 23 is configured to integrate the image picture with the voice information to obtain a video stream including the image picture and voice information;

展示单元24，设置为将所述视频流显示给所述第二用户。A display unit 24 is arranged to display the video stream to the second user.

这里，为了减少第一用户所需发送的数据量，所述第一获取单元21，是设置为接收第一用户向第二用户发送的语音信号，得到所述语音信息；或者，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。Here, in order to reduce the amount of data that the first user needs to send, the first acquiring unit 21 is configured to receive a voice signal sent by the first user to the second user, to obtain the voice information; or, receive the first user. The text information sent to the second user is converted into text information corresponding to the text information by text-to-speech conversion.

这里，作为另一种实现方式，如图3所示，所述第一获取单元21可以包括：Here, as another implementation manner, as shown in FIG. 3, the first acquiring unit 21 may include:

为了实现用户模型的建立，本发明实施例的上述装置还可以包括：In order to implement the establishment of the user model, the foregoing apparatus of the embodiment of the present invention may further include:

第二获取单元，设置为获取第一用户的用户头像和用户特征信息，所述用户特征信息至少包括第一用户的性别和年龄；a second acquiring unit, configured to acquire a user avatar of the first user and user feature information, where The user characteristic information includes at least the gender and age of the first user;

这里，上述的生成单元22可以包括：Here, the generating unit 22 described above may include:

控制处理单元，设置为根据所确定的面部表情，控制所述第一用户头像模型生成与所述面部表情相对应的面部动作，得到所述图像画面。The control processing unit is configured to control the first user avatar model to generate a facial motion corresponding to the facial expression according to the determined facial expression to obtain the image screen.

在该装置应用于视频服务器时，上述的展示单元24，是设置为将所述视频流发送至所述第二用户对应的终端，以通过该终端播放所述视频流。When the device is applied to the video server, the display unit 24 is configured to send the video stream to the terminal corresponding to the second user to play the video stream through the terminal.

综上，本发明实施例提供的即时通讯方法及装置，只需要用户通过客户端软件传送文本/语音和头像等信息，服务端进行对头像进行保存和对头像面部纹理进行编辑，进而在用户聊天过程中，主动生成用户头像画面，从而解决了视频聊天对带宽要求高的问题，用户只用上传一次其头像，即可实现永久视频聊天的目的。并且，本发明实施例不需要用户终端处安装摄像头，降低了终端的设备成本。In summary, the instant messaging method and apparatus provided by the embodiments of the present invention only need the user to transmit text/voice and avatar information through the client software, and the server performs the avatar saving and the avatar facial texture editing, and then the user chats. In the process, the user avatar screen is actively generated, thereby solving the problem that the video chat has high bandwidth requirements, and the user can upload the avatar once to realize the purpose of permanent video chat. Moreover, the embodiment of the invention does not require the camera to be installed at the user terminal, which reduces the equipment cost of the terminal.

以上所述是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明所述原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should also be considered as the scope of protection of the present invention.

本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现，所述计算机程序可以存储于一计算机可读存储介质中，所述计算机程序在相应的硬件平台上(如***、设备、装置、器件等)执行，在执行时，包括方法实施例的步骤之一或其组合。 One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.

可选地，上述实施例的全部或部分步骤也可以使用集成电路来实现，这些步骤可以被分别制作成一个个集成电路模块，或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.

上述实施例中的各装置/功能模块/功能单元可以采用通用的计算装置来实现，它们可以集中在单个的计算装置上，也可以分布在多个计算装置所组成的网络上。The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.

上述实施例中的各装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器，磁盘或光盘等。When each device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

工业实用性Industrial applicability

上述技术方案中，发送数据的第一用户不需要在本地设置摄像头，降低了终端的设备成本，同时还减少了视频通讯过程中网络数据量，降低了网络数据传输压力。此外，上述技术方案对该第一用户的网络接入带宽要求较低，即使在其网络带宽条件较差时，也能够在接收数据的第二用户处播放第一用户的图像画面，满足用户的视频通讯需求。 In the above technical solution, the first user who sends data does not need to set the camera locally, which reduces the equipment cost of the terminal, and also reduces the amount of network data in the video communication process, and reduces the network data transmission pressure. In addition, the foregoing technical solution has low requirements for the network access bandwidth of the first user, and even when the network bandwidth condition is poor, the image of the first user can be played at the second user receiving the data, and the user is satisfied. Video communication needs.

Claims

一种即时通讯方法，包括：An instant messaging method, including:

获取第一用户向第二用户发送的语音信息；Obtaining voice information sent by the first user to the second user;

根据预先确定第一用户的第一用户头像模型，生成一图像画面；Generating an image image according to the first user avatar model of the first user being determined in advance;

将所述图像画面与所述语音信息进行整合，得到一包含有所述图像画面和语音信息的视频流；Integrating the image picture with the voice information to obtain a video stream including the image picture and voice information;

将所述视频流展示给所述第二用户。Presenting the video stream to the second user.
如权利要求1所述的即时通讯方法，其中，The instant messaging method according to claim 1, wherein

所述获取第一用户向第二用户发送的语音信息，包括：The acquiring the voice information sent by the first user to the second user includes:

接收第一用户向第二用户发送的语音信息，或，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。Receiving the voice information sent by the first user to the second user, or receiving the text information sent by the first user to the second user, and obtaining the voice information corresponding to the text information by text-to-speech conversion.
如权利要求1所述的即时通讯方法，其中，The instant messaging method according to claim 1, wherein

所述获取第一用户向第二用户发送的语音信息，包括：The acquiring the voice information sent by the first user to the second user includes:

检测第一用户的上行传输带宽；Detecting an uplink transmission bandwidth of the first user;

在所述上行传输带宽小于预设第一门限时，提示第一用户仅发送文本信息或语音信号；When the uplink transmission bandwidth is less than a preset first threshold, prompting the first user to send only text information or a voice signal;

接收第一用户向第二用户发送的语音信息，或，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。Receiving the voice information sent by the first user to the second user, or receiving the text information sent by the first user to the second user, and obtaining the voice information corresponding to the text information by text-to-speech conversion.
如权利要求1所述的即时通讯方法，所述方法还包括：，The instant messaging method of claim 1, the method further comprising:

在所述获取第一用户向第二用户发送的语音信息之前，获取第一用户的用户头像和用户特征信息，所述用户特征信息至少包括第一用户的性别和年龄；Acquiring the user avatar and user feature information of the first user, where the user feature information includes at least the gender and age of the first user, before acquiring the voice information sent by the first user to the second user;

从预先建立的***用户模型中，确定与第一用户的用户特征信息相对应的用户模型；Determining, from a pre-established system user model, a user model corresponding to user characteristic information of the first user;

从所述用户头像中提取第一用户的面部皮肤纹理，将所述面部皮肤纹理与所述第一用户对应的用户模型相绑定，得到所述第一用户头像模型。Extracting a facial skin texture of the first user from the user avatar, and binding the facial skin texture to a user model corresponding to the first user to obtain the first user avatar model.
如权利要求4所述的即时通讯方法，其中，The instant messaging method according to claim 4, wherein

所述根据预先确定第一用户的第一用户头像模型，生成一图像画面，包括：Generating an image image according to the first user avatar model of the first user in advance include:

解析所述语音信息，确定所述语音信息对应的面部表情；Parsing the voice information to determine a facial expression corresponding to the voice information;

根据所确定的面部表情，控制所述第一用户头像模型生成与所述面部表情相对应的面部动作，得到所述图像画面。And controlling the first user avatar model to generate a facial motion corresponding to the facial expression according to the determined facial expression to obtain the image screen.
如权利要求1所述的即时通讯方法，其中，The instant messaging method according to claim 1, wherein

所述将所述视频流展示给所述第二用户，包括：The displaying the video stream to the second user includes:

将所述视频流发送至所述第二用户对应的终端，以通过该终端播放所述视频流。And sending the video stream to a terminal corresponding to the second user to play the video stream through the terminal.
一种即时通讯装置，包括：An instant messaging device comprising:

第一获取单元，设置为获取第一用户向第二用户发送的语音信息；a first acquiring unit, configured to acquire voice information sent by the first user to the second user;

生成单元，设置为根据预先确定第一用户的第一用户头像模型，生成一图像画面；a generating unit, configured to generate an image image according to the first user avatar model of the first user being determined in advance;

整合单元，设置为将所述图像画面与所述语音信息进行整合，得到一包含有所述图像画面和语音信息的视频流；An integration unit configured to integrate the image frame with the voice information to obtain a video stream including the image frame and voice information;

展示单元，设置为将所述视频流显示给所述第二用户。a display unit configured to display the video stream to the second user.
如权利要求7所述的即时通讯装置，其中，The instant messaging device of claim 7, wherein

所述第一获取单元，是设置为接收第一用户向第二用户发送的语音信号，得到所述语音信息；或者，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。The first acquiring unit is configured to receive a voice signal sent by the first user to the second user, to obtain the voice information, or receive text information sent by the first user to the second user, and obtain a text and voice conversion The voice information corresponding to the text information.
如权利要求7所述的即时通讯装置，其中，The instant messaging device of claim 7, wherein

所述第一获取单元包括：The first obtaining unit includes:

检测单元，设置为检测第一用户的上行传输带宽；a detecting unit, configured to detect an uplink transmission bandwidth of the first user;

提示单元，设置为在所述上行传输带宽小于预设第一门限时，提示第一用户仅发送文本信息或语音信号；The prompting unit is configured to prompt the first user to only send text information or a voice signal when the uplink transmission bandwidth is less than a preset first threshold;

接收单元，设置为接收第一用户向第二用户发送的语音信息，或，接收第一用户向第二用户发送的文本信息，通过文本语音转换，得到所述文本信息对应的语音信息。The receiving unit is configured to receive the voice information sent by the first user to the second user, or receive the text information sent by the first user to the second user, and obtain the voice information corresponding to the text information by text-to-speech conversion.
如权利要求7所述的即时通讯装置，还包括：The instant messaging device of claim 7, further comprising:

第二获取单元，设置为获取第一用户的用户头像和用户特征信息，所述用户特征信息至少包括第一用户的性别和年龄；a second acquiring unit, configured to acquire a user avatar of the first user and user feature information, where The user characteristic information includes at least the gender and age of the first user;

确定单元，设置为从预先建立的***用户模型中，确定与第一用户的用户特征信息相对应的用户模型；a determining unit, configured to determine, from a pre-established system user model, a user model corresponding to user characteristic information of the first user;

绑定单元，设置为从所述用户头像中提取第一用户的面部皮肤纹理，将所述面部皮肤纹理与所述第一用户对应的用户模型相绑定，得到所述第一用户头像模型。The binding unit is configured to extract a facial skin texture of the first user from the user avatar, and bind the facial skin texture to a user model corresponding to the first user to obtain the first user avatar model.
如权利要求10所述的即时通讯装置，其中，The instant messaging device of claim 10, wherein

所述生成单元包括：The generating unit includes:

解析单元，设置为解析所述语音信息，确定所述语音信息对应的面部表情；a parsing unit configured to parse the voice information to determine a facial expression corresponding to the voice information;

控制处理单元，设置为根据所确定的面部表情，控制所述第一用户头像模型生成与所述面部表情相对应的面部动作，得到所述图像画面。The control processing unit is configured to control the first user avatar model to generate a facial motion corresponding to the facial expression according to the determined facial expression to obtain the image screen.
如权利要求7所述的即时通讯装置，其中，The instant messaging device of claim 7, wherein

所述展示单元，是设置为将所述视频流发送至所述第二用户对应的终端，以通过该终端播放所述视频流。The display unit is configured to send the video stream to a terminal corresponding to the second user to play the video stream through the terminal.
一种计算机存储介质，所述计算机存储介质中存储有计算机可执行指令，所述计算机可执行指令用于执行权利要求1～6中任一项所述的方法。 A computer storage medium having stored therein computer executable instructions for performing the method of any one of claims 1 to 6.