[go: up one dir, main page]

WO2026012240A1 - Information processing method and apparatus, and electronic device - Google Patents

Information processing method and apparatus, and electronic device

Info

Publication number
WO2026012240A1
WO2026012240A1 PCT/CN2025/106282 CN2025106282W WO2026012240A1 WO 2026012240 A1 WO2026012240 A1 WO 2026012240A1 CN 2025106282 W CN2025106282 W CN 2025106282W WO 2026012240 A1 WO2026012240 A1 WO 2026012240A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
voice
contact
target
target contact
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2025/106282
Other languages
French (fr)
Chinese (zh)
Inventor
周宇
周鑫
赵阳
周煜啸
何贞毅
黄雪妍
莫洁莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2026012240A1 publication Critical patent/WO2026012240A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

Embodiments of the present application are applicable to the technical field of information, and provide an information processing method and apparatus, and an electronic device. By applying the method, when receiving information sent by a contact, an electronic device can acquire contact information on the basis of a user instruction or by means of analyzing the information, and determine a target contact who sent the information. In addition, by comprehensively considering factors such as the sources of a previous piece of information and a current piece of information, the familiarity between a user and the target contact, and the type and complexity of the information, a corresponding playback control policy can be determined. On this basis, the authorization to use sound features of the target contact is obtained from the target contact by means of permission control, and sounds identical or similar to sounds of the target contact can be synthesized for voice playback of information processed according to the playback control policy. By applying the method, it can not only be convenient for the user to develop a concrete sense of familiarity with the contact from the auditory level, but also improve the efficiency and timeliness with which the user acquires information content.

Description

信息处理方法、装置和电子设备Information processing methods, devices and electronic equipment

本申请要求于2024年07月10日提交至国家知识产权局、申请号为202410926760.9、申请名称为“智能传递信息内容的方法、装置和电子设备”的中国发明专利申请的优先权,以及要求于2024年11月01日提交至国家知识产权局、申请号为202411562369.1、申请名称为“信息处理方法、装置和电子设备”的中国发明专利申请的优先权。This application claims priority to Chinese invention patent application No. 202410926760.9, filed with the State Intellectual Property Office on July 10, 2024, entitled "Method, Apparatus and Electronic Device for Intelligent Transmission of Information Content", and to Chinese invention patent application No. 202411562369.1, filed with the State Intellectual Property Office on November 1, 2024, entitled "Information Processing Method, Apparatus and Electronic Device".

技术领域Technical Field

本申请实施例涉及信息技术领域,尤其涉及一种信息处理方法、装置和电子设备。This application relates to the field of information technology, and in particular to an information processing method, apparatus, and electronic device.

背景技术Background Technology

电子设备通过语音向用户播报接收到的信息,丰富了用户获取信息的方式。在一些场景中,当用户不方便直接查看电子设备上的信息时,电子设备通过语音播报信息提高了用户获取信息的及时性。示例性地,用户在开车、跑步、骑行等过程中,可能无法及时通过视觉查看电子设备上接收到的系统消息或各类应用程序(application,APP)发送的信息。例如,用户不方便查看社交类APP接收到的联系人发来的消息等。此时,电子设备通过语音将接收到的信息转录并播报给用户,可以方便用户了解信息的内容。Electronic devices can broadcast received information to users via voice, enriching the ways users access information. In some scenarios, when it's inconvenient for users to directly view information on their electronic devices, voice broadcasts improve the timeliness of information access. For example, while driving, running, or cycling, users may not be able to visually view system messages or information sent by various applications (apps) received on their electronic devices. For instance, it might be inconvenient for users to view messages from contacts received through social media apps. In such cases, electronic devices can transcribe and broadcast the received information to the user via voice, making it easier for the user to understand the content.

现有技术中,电子设备语音播报信息通常使用系统默认的或者自定义的特定音源。无论是系统消息,还是各类APP传输的信息,电子设备都是使用相同的声音数据来进行播报。在一种示例中,对于通信类或社交类APP不同联系人发来的消息,电子设备都使用相同的声音进行播报,容易给用户带来困扰,用户无法从听觉层面建立起播报的信息内容与发送信息的联系人之间的直接关联,降低了用户获取信息的效率。另一方面,电子设备接收到的信息来源广泛,需要播报的信息内容形式多样。例如,电子设备接收到需要播报的信息可能是长文本或者包含链接、特殊字符等内容,现有技术在语音播报信息时将会对这些内容进行播报,不仅播报时间较长,也不利于用户快速了解信息的具体内容。In existing technologies, electronic devices typically use system default or custom audio sources to broadcast information via voice. Whether it's system messages or information transmitted by various apps, electronic devices use the same audio data for broadcasting. In one example, for messages from different contacts in communication or social apps, the electronic device uses the same voice to broadcast them, which can be confusing for users. Users cannot establish a direct auditory connection between the broadcast information and the sender, reducing the efficiency of information retrieval. Furthermore, electronic devices receive information from a wide range of sources, and the content to be broadcast is diverse. For example, the information received by the electronic device may be long text or contain links, special characters, etc. Existing technologies will broadcast these elements during voice broadcasting, resulting in longer broadcast times and hindering users from quickly understanding the specific content of the information.

发明内容Summary of the Invention

本申请实施例提供的一种信息处理方法、装置和电子设备,可以采用与目标联系人的真实声音相同或相似的声音对接收到的信息内容进行语音播报,方便用户从听觉层面对目标联系人产生具象的熟悉感;同时,本申请实施例还可以对接收到的信息进行处理,例如可以对信息进行简化、保留信息中的关键或重要内容;根据本条信息与前一条信息的差异,补充信息来源和/或联系人来源;对于非文本信息,可以根据信息类型的不同,转换为相应的可播报的文本信息等,通过上述各种方式的处理,可以提高用户获取信息内容的效率和及时性。This application provides an information processing method, apparatus, and electronic device that can use a voice identical or similar to the real voice of the target contact person to broadcast received information content, allowing users to develop a concrete sense of familiarity with the target contact person through auditory perception. Simultaneously, this application can also process the received information, such as simplifying the information, retaining key or important content, supplementing the information source and/or contact person source based on the differences between the current and previous information, and converting non-text information into corresponding broadcastable text information according to different information types. Through these various processing methods, the efficiency and timeliness of users obtaining information content can be improved.

为达到上述目的,本申请实施例采用如下技术方案:To achieve the above objectives, the embodiments of this application adopt the following technical solutions:

本申请实施例的第一方面提供了一种信息处理方法,包括:The first aspect of this application provides an information processing method, including:

响应于接收的第一信息,确定发送所述第一信息的目标联系人;In response to the received first information, determine the target contact to whom the first information was sent;

检索所述目标联系人的语音消息,并基于所述语音消息生成与所述目标联系人的声音相似的目标声音,所述语音消息可以包括第一页面中的语音消息;Retrieve voice messages of the target contact and generate a target voice similar to the voice of the target contact based on the voice messages. The voice messages may include voice messages on the first page.

生成与所述第一信息相对应的目标信息;Generate target information corresponding to the first information;

采用目标声音对所述目标信息进行语音播报。The target information is broadcast using the target's voice.

应理解,第一信息可以是当前接收到的信息,目标联系人是发送第一信息的联系人,目标联系人可以通过应用程序发送第一信息。应用本申请实施例提供的信息处理方法,电子设备可以采用与发送第一信息的目标联系人相同或相似的声音,对接收到的第一信息进行播报,从而方便用户从听觉层面建立起信息与目标联系人之间的熟悉感,有助于用户高效、及时地获取信息内容。It should be understood that the first information can be currently received information, and the target contact is the contact who sent the first information. The target contact can send the first information through an application. By applying the information processing method provided in the embodiments of this application, the electronic device can use the same or similar voice as the target contact who sent the first information to broadcast the received first information, thereby facilitating the user to establish a sense of familiarity between the information and the target contact from an auditory perspective, which helps the user to obtain information content efficiently and in a timely manner.

同时,电子设备在接收到第一信息后,还可以按照相应的播报策略对第一信息进行处理。例如,将非文本信息转换为文本信息,对文本信息进行精简,根据相邻信息间的差异,补充或省略信息来源和/或联系人来源等。电子设备采用与目标联系人的真实声音相同或相似的目标声音所播报的目标信息可以是按照上述播报策略处理后的信息。这样,可以提高进一步提高用户获取信息的效率。Simultaneously, after receiving the first information, the electronic device can process it according to a corresponding broadcasting strategy. For example, it can convert non-text information into text information, simplify the text information, and supplement or omit the information source and/or contact person source based on the differences between adjacent information. The target information broadcast by the electronic device using a target voice that is the same as or similar to the target contact person's real voice can be information processed according to the above broadcasting strategy. This can further improve the efficiency of users obtaining information.

在本申请实施例的第一方面的一种可能的实现方式中,该方法还包括:检索所述目标联系人的语音消息;基于所述目标联系人的语音消息,生成与所述目标联系人的真实声音相同或相似的目标声音。In one possible implementation of the first aspect of this application, the method further includes: retrieving the voice message of the target contact; and generating a target voice that is the same as or similar to the real voice of the target contact based on the voice message of the target contact.

其中,检索目标联系人的语音消息可以在第一页面中进行,第一页面可以包括当前用户与目标联系人进行会话的会话页面。示例性地,会话页面可以是即时通信应用等提供的聊天页面。The retrieval of voice messages from the target contact can be performed on the first page, which may include the conversation page between the current user and the target contact. For example, the conversation page may be a chat page provided by an instant messaging application.

应理解,目标声音是与发送信息的目标联系人的真实声音相同或相似的声音,目标声音可以是基于目标联系人的声音特征生成的。本申请实施例通过检索目标联系人的语音消息,可以从检索到的语音消息中提取目标联系人的声音特征,合成与目标联系人的真实声音相同或相似的目标声音,用于后续的信息播报。本申请实施例无需开发第三方应用接口,可以通过系统实现自动化无感的声音检索。It should be understood that the target voice is a voice that is the same as or similar to the real voice of the target contact sending the information, and the target voice can be generated based on the voice characteristics of the target contact. This application embodiment retrieves the voice messages of the target contact, extracts the voice characteristics of the target contact from the retrieved voice messages, and synthesizes a target voice that is the same as or similar to the real voice of the target contact for subsequent information broadcasting. This application embodiment does not require the development of third-party application interfaces and can achieve automated and seamless voice retrieval through the system.

作为本申请实施例的第一方面的一种示例,检索所述目标联系人的语音消息可以通过打标检索的方式实现。在进行打标检索时,电子设备可以确定检索所述目标联系人的语音消息的检索要素,所述检索要素至少包括所述目标联系人的联系人信息;根据所述检索要素,在标记信息数据库中进行检索,得到所述目标联系人的语音消息。其中,所述标记信息数据库可以是对接收到的各个联系人的语音消息进行信息标记后,通过存储相应的标记信息构成的。As an example of a first aspect of this application, retrieving the voice messages of the target contact can be achieved through a tagging retrieval method. During tagging retrieval, the electronic device can determine retrieval elements for retrieving the voice messages of the target contact, the retrieval elements including at least the contact information of the target contact; based on the retrieval elements, a search is performed in a tagging information database to obtain the voice messages of the target contact. The tagging information database can be constructed by storing corresponding tagging information after tagging the received voice messages of each contact.

作为本申请实施例的第一方面的另一种示例,检索所述目标联系人的语音消息还可以通过翻页检索的方式实现。在进行翻页检索时,电子设备可以打开应用程序中与所述目标联系人关联的第一页面,例如,上述会话页面。然后,从所述第一页面中检索由所述目标联系人发送的语音消息。As another example of the first aspect of this application, retrieving voice messages from the target contact can also be achieved through a page-turning search. When performing a page-turning search, the electronic device can open a first page associated with the target contact in the application, such as the aforementioned conversation page. Then, voice messages sent by the target contact can be retrieved from the first page.

其中,上述第一页面可以包括打开所述应用程序后直接呈现的所述目标联系人的当前会话页面;或者,第一页面还可以包括当前用户与目标联系人的历史会话页面。可以通过滑动当前会话页面显示历史会话页面。The first page may include the current conversation page of the target contact, which is displayed directly after the application is opened; or, the first page may also include the historical conversation page between the current user and the target contact. The historical conversation page can be displayed by swiping the current conversation page.

示例性地,第一页面可以包括当前用户与目标联系人的会话页面。在进行翻页检索时,可以显示与目标联系人关联的第一页面,即当前用户与目标联系人的会话页面,在会话页面中检索由目标联系人发送的语音消息。For example, the first page may include the conversation page between the current user and the target contact. When performing a pagination search, the first page associated with the target contact, i.e., the conversation page between the current user and the target contact, can be displayed, and voice messages sent by the target contact can be retrieved in the conversation page.

或者,第一页面也可以包括当前用户与目标联系人的历史会话页面。可以响应于第一操作,显示当前用户与目标联系人的历史会话页面,在历史会话页面中检索由目标联系人发送的语音消息。上述第一操作可以是滑动会话页面或其他能够实现上述功能的操作。以滑动会话页面为例,电子设备响应于滑动会话页面的操作,可以显示当前用户与目标联系人的历史会话页面,再在历史会话页面中检索语音消息。Alternatively, the first page may also include a history page of conversations between the current user and the target contact. In response to the first action, the history page of conversations between the current user and the target contact can be displayed, and voice messages sent by the target contact can be retrieved from the history page. The first action can be swiping the conversation page or other actions capable of achieving the above functionality. For example, swiping the conversation page: in response to the swiping action, the electronic device can display the history page of conversations between the current user and the target contact, and then retrieve voice messages from the history page.

在本申请实施例的第一方面的一种可能的实现方式中,上述第一页面还包括与所述目标联系人直接会话的个人会话页面和/或包含所述目标联系人的群组会话页面。因此,从所述第一页面中检索由所述目标联系人发送的语音消息,包括:在所述个人会话页面和/或所述群组会话页面中检索由所述目标联系人发送的语音消息。这样,通过在个人会话页面以及群组会话页面中分别进行检索,可以提高准确检索到目标联系人的语音消息的可能性。In one possible implementation of the first aspect of this application, the first page further includes a personal conversation page for direct interaction with the target contact and/or a group conversation page containing the target contact. Therefore, retrieving voice messages sent by the target contact from the first page includes retrieving voice messages sent by the target contact from the personal conversation page and/or the group conversation page. This increases the likelihood of accurately retrieving voice messages from the target contact by performing searches separately on the personal conversation page and the group conversation page.

在本申请实施例的第一方面的另一种可能的实现方式中,所述语音消息还包括第二页面中的语音消息,第二页面可以包括历史消息记录检索页面,因此在检索目标联系人的语音消息时,可以响应于第二操作,显示与所述目标联系人关联的第二页面,并在第二页面中检索由目标联系人发送的语音消息。In another possible implementation of the first aspect of the embodiments of this application, the voice message further includes voice messages on a second page. The second page may include a historical message record retrieval page. Therefore, when retrieving voice messages of a target contact, in response to the second operation, the second page associated with the target contact can be displayed, and voice messages sent by the target contact can be retrieved on the second page.

在一种示例中,第二操作可以是对历史消息入口执行的操作。例如,电子设备可以通过执行对历史消息入口的操作,显示历史消息记录检索页面,并在历史消息记录检索页面中检索由目标联系人发送的语音消息。电子设备对历史消息入口进行的操作,可以通过调用相应接口或模拟用户行为来实现。上述通过应用程序提供的历史消息入口显示与所述目标联系人关联的历史消息记录检索页面,进而在历史消息记录检索页面中进行检索的方式为搜索检索。In one example, the second operation could be performed on a historical message entry. For instance, an electronic device could display a historical message record retrieval page by performing an operation on the historical message entry, and then retrieve voice messages sent by the target contact within that page. The operation performed by the electronic device on the historical message entry could be implemented by calling a corresponding interface or simulating user behavior. The aforementioned method of displaying a historical message record retrieval page associated with the target contact through a historical message entry provided by the application, and then performing a retrieval within that page, is a search retrieval.

本申请实施例可以通过多种检索方式检索目标联系人的语音消息,扩大了目标联系人的语音消息的来源,有助于获得更有用的语音消息。The embodiments of this application can retrieve voice messages of target contacts through multiple retrieval methods, expanding the sources of voice messages of target contacts and helping to obtain more useful voice messages.

在本申请实施例的第一方面的一种可能的实现方式中,电子设备在基于所述目标联系人的语音消息,生成与所述目标联系人的真实声音相同或相似的目标声音时,可以从所述目标联系人的语音消息中提取所述目标联系人的声音特征;根据所述声音特征和接收的第一信息进行声音合成,得到与所述目标联系人的真实声音相同或相似的目标声音。In one possible implementation of the first aspect of this application, when an electronic device generates a target voice that is the same as or similar to the real voice of the target contact based on the voice message of the target contact, it can extract the voice features of the target contact from the voice message of the target contact; and perform voice synthesis based on the voice features and the received first information to obtain a target voice that is the same as or similar to the real voice of the target contact.

作为本申请实施例的第一方面的一种示例,从所述目标联系人的语音消息中提取所述目标联系人的声音特征,可以通过模拟播放检索到的语音消息,并在模拟播放语音消息的过程中抓取音频数据用于声音特征的提取。即,可以通过模拟点击并播放检索到的所述目标联系人的语音消息;在模拟播放所述目标联系人的语音消息的过程中,抓取音频数据;从所述音频数据中提取所述目标联系人的声音特征。As an example of a first aspect of this application, extracting the voice features of the target contact from the target contact's voice messages can be achieved by simulating the playback of the retrieved voice messages and capturing audio data during the simulated playback for voice feature extraction. Specifically, this can be done by simulating clicking and playing the retrieved voice messages of the target contact; capturing audio data during the simulated playback; and extracting the voice features of the target contact from the audio data.

这样,可以在用户无感的情况下,快速提取得到目标联系人的声音特征,用于后续的声音合成或克隆。In this way, the voice features of the target contact can be quickly extracted without the user's awareness, and used for subsequent voice synthesis or cloning.

应理解,在某些场合或某些场景下,提取和抓取所表述的含义是相同的。It should be understood that in certain contexts or scenarios, "extract" and "fetch" have the same meaning.

在本申请实施例的第一方面的一种可能的实现方式中,所述语音消息还可以包括所述目标联系人预先录制或在与所述目标联系人通话的过程中录制并存储的语音消息。因此,所述检索所述目标联系人的语音消息,还包括:根据所述目标联系人的联系人信息,在存储有所述目标联系人的语音消息的数据库中检索所述目标联系人的语音消息。In one possible implementation of the first aspect of this application, the voice message may further include voice messages pre-recorded by the target contact or recorded and stored during a call with the target contact. Therefore, retrieving the voice message of the target contact further includes: retrieving the voice message of the target contact from a database storing the voice message of the target contact based on the target contact's contact information.

本申请实施例可以允许联系人自行录制参考音频,供他人使用;也可以在获取联系人授权的情况下,在用户与联系人通话的过程中录制参考音频。参考音频可以上传至云端数据库,也可以保存在本地。This application embodiment allows contacts to record reference audio for use by others; alternatively, with the contact's authorization, reference audio can be recorded during a call between the user and the contact. The reference audio can be uploaded to a cloud database or stored locally.

在本申请实施例的第一方面的一种可能的实现方式中,若检索到的所述语音消息中包含多个联系人的声音,则可以通过确定用于声音筛选的因素;根据所述因素,从所述语音消息中提取属于所述目标联系人的声音。其中,所述因素可以包括但不限于如下中的至少一项:联系人声音的频谱信息、声强信息或持续时间信息。示例性地,电子设备可以根据联系人声音的频谱信息筛选目标联系人的声音,或者根据联系人声音的声强信息筛选目标联系人的声音,也可以同时根据频谱信息和声强信息筛选目标联系人的声音。In one possible implementation of the first aspect of this application, if the retrieved voice message contains the voices of multiple contacts, factors for voice filtering can be determined; based on these factors, the voice belonging to the target contact can be extracted from the voice message. These factors may include, but are not limited to, at least one of the following: spectral information, sound intensity information, or duration information of the contact's voice. For example, the electronic device can filter the voice of a target contact based on the spectral information of the contact's voice, or based on the sound intensity information of the contact's voice, or simultaneously based on both spectral information and sound intensity information.

这样,在获取到的语音消息包含多个人声时,本申请实施例通过目标人声的识别与筛选,可以保证获得优质的语音消息用于声音特征的提取,进而根据提取得到的声音特征进行目标声音的合成或克隆,提高了播报目标信息时所使用的目标声音的质量。Thus, when the acquired voice message contains multiple human voices, this embodiment of the application can ensure that high-quality voice messages are obtained for sound feature extraction by identifying and filtering the target human voice. Then, the target voice is synthesized or cloned based on the extracted sound features, thereby improving the quality of the target voice used when broadcasting target information.

在本申请实施例的第一方面的一种可能的实现方式中,所述生成与所述第一信息相对应的目标信息,包括:确定信息来源;根据所述信息来源和接收到的所述第一信息的内容,生成目标信息。即,电子设备可以根据相应的播报策略,对接收到的第一信息进行处理,得到待播报的目标信息。上述信息来源可以是在语音播报目标信息时需要进行播报的来源,它可以是按照播报策略进行处理后得到的。例如,上述信息来源可以包括第一信息完整的来源,也可以包括省略或简化其中部分内容后得到的来源内容。In one possible implementation of the first aspect of this application, generating target information corresponding to the first information includes: determining the information source; and generating the target information based on the information source and the content of the received first information. That is, the electronic device can process the received first information according to a corresponding broadcasting strategy to obtain the target information to be broadcast. The aforementioned information source can be the source that needs to be broadcast when the target information is broadcast via voice; it can be obtained after processing according to the broadcasting strategy. For example, the aforementioned information source can include the complete source of the first information, or it can include source content obtained by omitting or simplifying some of its content.

其中,所述确定信息来源,包括:确定所述第一信息与相邻的前一条信息之间的差异;所述差异包括接收时间间隔、所属平台、会话类型、所属群组、所属联系人;根据所述差异确定待播报的信息来源的具体内容。The determination of the information source includes: determining the difference between the first information and the adjacent previous information; the difference includes the receiving time interval, the platform, the session type, the group, and the contact person; and determining the specific content of the information source to be broadcast based on the difference.

应理解,上述接收时间间隔可以是指接收到两条信息的时间间隔。例如,接收到前一条信息的时间为t1,接收到本条消息的时间为t2,则接收时间间隔=t2-t1。所属平台可以是指接收到信息的应用程序平台。例如短信平台或社交应用平台。如果两条信息均是通过短信平台发送和接收的短消息,则这两条信息所属平台相同;如果其中一条信息是通过短信平台接收的短消息,另一条信息是通过某一社交应用接收到的即时消息,则这两条信息所属的平台不同。会话类型可以是指当前会话是属于个人会话或群组会话,上述个人会话可以是用户与联系人之间一对一的私聊会话,群组会话可以是指包含用户与多个联系人的群组中的会话,如群聊等。所属群组的差异可以是在确定信息属于群组会话中的消息时,信息所属的群组是否相同,即信息是否来自同一个群聊。所属联系人的差异则是指通过应用程序发送信息的联系人是否为同一个联系人。It should be understood that the aforementioned receiving time interval can refer to the time interval between receiving two messages. For example, if the time to receive the previous message is t1 and the time to receive this message is t2, then the receiving time interval = t2 - t1. The platform can refer to the application platform on which the message is received, such as an SMS platform or a social application platform. If both messages are SMS messages sent and received through an SMS platform, then they belong to the same platform; if one message is received through an SMS platform and the other is an instant message received through a social application, then they belong to different platforms. The conversation type can refer to whether the current conversation is a personal conversation or a group conversation. A personal conversation can be a one-to-one private chat between a user and a contact, while a group conversation can refer to a conversation within a group containing the user and multiple contacts, such as a group chat. The difference in the group affiliation can refer to whether the messages belong to the same group when it is determined that the messages belong to a group conversation, i.e., whether the messages come from the same group chat. The difference in the contact affiliation refers to whether the contact sending the messages through the application is the same contact.

所述根据所述差异确定待播报的信息来源的具体内容,包括:若所述第一信息与相邻的前一条信息之间的接收时间间隔大于预设间隔,则确定待播报的信息来源的内容包括完整的信息来源;The step of determining the specific content of the information source to be broadcast based on the difference includes: if the reception time interval between the first information and the adjacent previous information is greater than a preset interval, then the content of the information source to be broadcast includes the complete information source;

若所述第一信息与相邻的前一条信息之间的接收时间间隔小于或等于所述预设间隔,则依次判断所述第一信息与相邻的前一条信息之间的所属平台、会话类型、所属群组、所属联系人的变化情况;根据所述变化情况确定待播报的信息来源的具体内容。If the reception time interval between the first message and the adjacent previous message is less than or equal to the preset interval, then the changes in the platform, session type, group, and contact person between the first message and the adjacent previous message are determined sequentially; and the specific content of the information source to be broadcast is determined based on the changes.

在本申请实施例的第一方面的一种可能的实现方式中,所述根据所述变化情况确定待播报的信息来源的具体内容,包括:若所述第一信息与相邻的前一条信息之间的所属平台、会话类型、所属群组、所属联系人中任一项发生变换,则确定待播报的信息来源的内容包括发生变换的相应内容。In one possible implementation of the first aspect of this application, determining the specific content of the information source to be broadcast based on the changes includes: if any one of the platform, session type, group, or contact to which the first information belongs changes with the adjacent previous information, then determining the content of the information source to be broadcast includes the corresponding content that has changed.

本申请实施例通过依次基于相邻消息的间隔时间、所属平台、所属群组、所属联系人,判定二者在来源上的差异,从而可以在信息来源相同时,省略对信息来源的播报,减少冗余信息播报带来的时长增加,提高信息播报效率,有助于用户快了解信息内容。This application embodiment determines the differences in the source of two messages by sequentially based on the interval between adjacent messages, the platform to which they belong, the group to which they belong, and the contact to which they belong. This allows for the omission of the broadcast of the information source when the information sources are the same, reducing the increase in time caused by redundant information broadcasting, improving information broadcasting efficiency, and helping users quickly understand the information content.

在本申请实施例的第一方面的一种可能的实现方式中,所述根据所述差异确定待播报的信息来源的具体内容,还包括:确定当前用户与所述目标联系人的熟悉程度;根据所述熟悉程度确定待播报的信息来源中包括的所述目标联系人的名称的具体内容。In one possible implementation of the first aspect of this application, determining the specific content of the information source to be broadcast based on the difference further includes: determining the familiarity between the current user and the target contact; and determining the specific content of the name of the target contact included in the information source to be broadcast based on the familiarity.

应理解,对于熟悉联系人,用户可以根据声音特征分辨发送信息的该联系人是谁,因此采用与联系人相同或相似的声音播报信息时,可以省略对熟悉联系人的名称的播报,提高信息播报效率。It should be understood that for familiar contacts, users can identify who the contact is by their voice characteristics. Therefore, when using a voice that is the same as or similar to the contact's voice to broadcast the message, the name of the familiar contact can be omitted, thus improving the efficiency of message broadcasting.

在本申请实施例的第一方面的一种可能的实现方式中,可以根据用户与联系人的互动行为特征来判断某一联系人是否为用户的熟悉联系人。因此,所述确定用户与所述目标联系人的熟悉程度,包括:获取所述当前用户与所述目标联系人的互动行为特征;根据所述互动行为特征,确定所述用户与所述目标联系人的熟悉程度。上述互动行为特征可以包括聊天行为特征。In one possible implementation of the first aspect of this application, it is possible to determine whether a contact is a familiar contact of the user based on the interaction behavior characteristics between the user and the contact. Therefore, determining the familiarity between the user and the target contact includes: acquiring the interaction behavior characteristics between the current user and the target contact; and determining the familiarity between the user and the target contact based on the interaction behavior characteristics. The aforementioned interaction behavior characteristics may include chat behavior characteristics.

作为本申请实施例的第一方面的一种示例,如果基于所述熟悉程度确定所述目标联系人为所述当前用户的熟悉联系人,则可以确定待播报的信息来源中可省略所述目标联系人的名称。As an example of the first aspect of the embodiments of this application, if it is determined that the target contact is a familiar contact of the current user based on the familiarity level, it can be determined that the name of the target contact can be omitted from the source of the information to be broadcast.

如果基于所述熟悉程度确定所述目标联系人为所述当前用户的非熟悉联系人,则可以对所述目标联系人的名称进行简化,确定待播报的信息来源中包括的所述目标联系人的名称为简化后的所述目标联系人的名称。If the target contact is determined to be an unfamiliar contact of the current user based on the level of familiarity, the name of the target contact can be simplified, and the name of the target contact included in the information source to be broadcast can be the simplified name of the target contact.

在本申请实施例的第一方面的一种可能的实现方式中,所述根据所述信息来源和接收到的所述第一信息的内容,生成待播报的目标信息,包括:预估语音播报所述第一信息的时长;若所述时长超过预设值,对所述第一信息进行精简;根据所述信息来源和精简后的所述第一信息的内容,生成待播报的目标信息。In one possible implementation of the first aspect of this application, generating target information to be broadcast based on the information source and the content of the received first information includes: estimating the duration of the voice broadcast of the first information; if the duration exceeds a preset value, simplifying the first information; and generating target information to be broadcast based on the information source and the content of the simplified first information.

应理解,内容较多或较冗余的信息将会占用大量的播报时长,不利于用户快速了解信息内容。因此,本申请实施例在生成目标信息的过程中,可以对第一信息的播报时长进行预估。当预估得到的播报时长超过预设值时,电子设备可以对第一信息进行精简,简化需要播报的信息的内容,减少播报时长,方便用户快速了解信息内容。It should be understood that information that is lengthy or redundant will consume a significant amount of broadcast time, making it difficult for users to quickly understand the information. Therefore, in the process of generating the target information, this embodiment of the application can estimate the broadcast time of the first information. When the estimated broadcast time exceeds a preset value, the electronic device can simplify the first information, reduce the content of the information to be broadcast, and facilitate users to quickly understand the information.

其中,电子设备对第一信息的精简可以是文字数量上的精简,精简后的信息相较于未精简的原始信息,文字数量更少,但精简后的信息仍然包含原始信息中较为关键、核心的内容,通过对信息的精简,不会使得用户漏掉其中的关键信息。In this context, the simplification of the first information by electronic devices can be achieved by reducing the number of words. The simplified information has fewer words than the original information, but it still contains the more critical and core content of the original information. By simplifying the information, users will not miss any key information.

作为本申请实施例的一种示例,电子设备对第一信息的精简可以是对原始内容的概括、保留原始内容中的关键或重要信息,删减连接词或重复、无实际意义的词语等处理。As an example of an embodiment of this application, the simplification of the first information by the electronic device may include summarizing the original content, retaining key or important information in the original content, deleting connecting words or repetitive or meaningless words, etc.

在本申请实施例的第一方面的一种可能的实现方式中,所述第一信息还可以包括非文本信息,所述根据所述信息来源和接收到的所述第一信息的内容,生成待播报的目标信息,还包括:确定所述非文本信息的信息类型;根据所述信息类型对所述非文本信息进行文本转换;根据所述信息来源和文本转换后的所述第一信息,生成待播报的目标信息。In one possible implementation of the first aspect of this application, the first information may further include non-text information. The step of generating target information to be broadcast based on the information source and the content of the received first information further includes: determining the information type of the non-text information; performing text conversion on the non-text information according to the information type; and generating target information to be broadcast based on the information source and the text-converted first information.

作为本申请实施例的第一方面的一种示例,所述非文本信息可以包括链接信息、图片、文件、表情包、文章推送、小程序、卡片信息等所有不是纯文本类型的信息。电子设备可以根据非文本信息的具体类型,对其进行文本转换,得到转换后的文本信息。以链接信息为例,链接信息可以包括网址以及基于网址对应的内容生成的其他形式的信息,例如包含文本的卡片等等。因此,所述根据所述信息类型对所述非文本信息进行文本转换,包括:确定所述链接信息对应的链接内容,并对所述链接内容进行概括得到文本形式的概括文本。即,电子设备可以通过模拟打开该链接,获取链接对应的内容,并通过概括等处理方式,得到文本形式的信息。例如,链接信息为某一新闻对应的网址,在对该链接信息进行处理时,电子设备可以模拟打开该网址读取新闻内容,并概括出新闻的主要内容,作为文本转换后的信息。As an example of a first aspect of this application, the non-text information may include all information that is not plain text, such as link information, images, files, emoticons, article pushes, mini-programs, and card information. The electronic device can convert the non-text information into text based on its specific type to obtain the converted text information. Taking link information as an example, link information may include URLs and other forms of information generated based on the content corresponding to the URL, such as cards containing text. Therefore, the step of converting the non-text information into text based on the information type includes: determining the link content corresponding to the link information and summarizing the link content to obtain a summarized text. That is, the electronic device can simulate opening the link, obtain the content corresponding to the link, and obtain the text information through summarization and other processing methods. For example, if the link information is a URL corresponding to a news article, when processing the link information, the electronic device can simulate opening the URL to read the news content and summarize the main content of the news as the converted text information.

作为本申请实施例的第一方面的另一种示例,在根据所述信息类型对所述非文本信息进行文本转换之后,还包括:确定接收到所述非文本信息的会话类型;根据所述会话类型为文本转换后的所述信息添加过渡语,所述过渡语的句式可以是主谓宾句式或谓宾句式或其他句式,如主谓句式、主谓宾补句式等。As another example of the first aspect of the embodiments of this application, after converting the non-text information into text according to the information type, the method further includes: determining the session type of receiving the non-text information; adding a transition phrase to the text-converted information according to the session type, wherein the sentence structure of the transition phrase can be a subject-verb-object sentence structure or a verb-object sentence structure or other sentence structures, such as a subject-verb sentence structure, a subject-verb-object complement sentence structure, etc.

本申请实施例可以根据信息类型对接收到的非文本信息进行处理,转换得到可用于语音播报的文本信息,提高信息传递的效率和准确性。The embodiments of this application can process the received non-text information according to the information type and convert it into text information that can be used for voice broadcasting, thereby improving the efficiency and accuracy of information transmission.

在本申请实施例的第一方面的一种可能的实现方式中,所述第一信息可以包括在预设时间段内发送的多条信息,所述生成与所述第一信息相对应的待播报的目标信息,还包括:对在预设时间段内发送的多条信息进行合并;生成与合并后的所述多条信息相对应的待播报的目标信息。电子设备对多条信息的合并,可以根据实际情况的不同,采用不同的原则来进行。例如,可以对同一联系人发送的多条信息进行拼接,或者,对于多个联系人发送的包含有相似内容的信息,可以筛选出信息中相似的内容,在保持语义不变的情况下合并为一条信息,等等。In one possible implementation of the first aspect of this application, the first information may include multiple messages sent within a preset time period. Generating target information to be broadcast corresponding to the first information further includes: merging the multiple messages sent within the preset time period; and generating target information to be broadcast corresponding to the merged multiple messages. The electronic device can merge multiple messages using different principles depending on the actual situation. For example, multiple messages sent by the same contact can be spliced together, or, for messages from multiple contacts containing similar content, similar content can be filtered out and merged into a single message while maintaining semantic integrity, and so on.

其中,所述对在预设时间段内发送的多条信息进行合并,包括:确定所述目标联系人在预设时间段内发送的多条信息的会话类型;分别将所述目标联系人在预设时间段内发送且属于相同会话类型的多条信息进行合并。The step of merging multiple messages sent within a preset time period includes: determining the session type of the multiple messages sent by the target contact within the preset time period; and merging the multiple messages sent by the target contact within the preset time period that belong to the same session type.

在本申请实施例的第一方面的另一种可能的实现方式中,所述第一信息还可以包括多个关联联系人在预设时间段内发送的多条群组会话信息,所述对在预设时间段内发送的多条信息进行合并,包括:分别确定多个所述关联联系人在预设时间段内发送的多条信息的内容;将多个所述关联联系人在预设时间段内发送的且内容相似的多条信息进行合并。In another possible implementation of the first aspect of this application, the first information may further include multiple group conversation messages sent by multiple associated contacts within a preset time period. The merging of the multiple messages sent within the preset time period includes: determining the content of the multiple messages sent by the multiple associated contacts within the preset time period; and merging the multiple messages sent by the multiple associated contacts within the preset time period that have similar content.

本申请实施例可以对目标联系人连续发送的多条信息进行合并,也可以对多个关联联系人发送的多条内容相似的信息进行合并,进一步简化了需要播报的信息的内容,避免了逐条对接收到的信息进行播报,减少了播报时长,提高了信息传递的效率。This application embodiment can merge multiple messages sent continuously by a target contact, or merge multiple messages with similar content sent by multiple associated contacts, further simplifying the content of the information to be broadcast, avoiding broadcasting each received message individually, reducing broadcast time, and improving the efficiency of information transmission.

在本申请实施例的第一方面的一种可能的实现方式中,所述第一信息还可以包括非即时信息,所述方法还包括:响应于用户指令,检索所述用户指令对应的一条或多条非即时信息;生成与一条或多条所述非即时信息相对应的待播报的目标信息,并对所述目标信息进行语音播报。In one possible implementation of the first aspect of this application, the first information may further include non-real-time information, and the method further includes: in response to a user instruction, retrieving one or more pieces of non-real-time information corresponding to the user instruction; generating target information to be broadcast corresponding to one or more pieces of non-real-time information, and broadcasting the target information by voice.

应理解,用户指令可以是用户发出的请求获取相关信息的指令。本申请实施例通过响应用户指令来检索满足用户需求的相关信息,可以实现信息检索与播报过程中的人机交互,有助于基于用户实际需求筛选信息,避免了手动检索信息时操作繁琐的问题,提高了用户获取信息的效率。It should be understood that a user instruction can be an instruction issued by a user to request relevant information. This application embodiment retrieves relevant information that meets the user's needs by responding to user instructions. This enables human-computer interaction during information retrieval and broadcasting, helps to filter information based on the user's actual needs, avoids the cumbersome operation of manual information retrieval, and improves the efficiency of information acquisition for users.

本申请实施例的第二方面提供了一种信息处理装置,包括:A second aspect of this application provides an information processing apparatus, including:

联系人确定模块,用于响应于接收的第一信息,确定发送所述第一信息的目标联系人;A contact identification module is used to identify the target contact who sent the first information in response to the received first information.

语音消息检索模块,用于检索所述目标联系人的语音消息;A voice message retrieval module is used to retrieve voice messages from the target contact.

目标声音生成模块,用于基于所述语音消息生成与所述目标联系人的声音相似的目标声音,所述语音消息包括第一页面中的语音消息;A target voice generation module is used to generate a target voice similar to the voice of the target contact based on the voice message, wherein the voice message includes the voice message in the first page;

目标信息生成模块,用于生成与所述第一信息相对应的目标信息;The target information generation module is used to generate target information corresponding to the first information;

语音播报模块,用于采用所述目标声音对所述目标信息进行语音播报。The voice broadcasting module is used to broadcast the target information using the target voice.

本申请实施例的第三方面提供了一种电子设备,该电子设备可以包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序。其中,处理器执行计算机程序时可以实现如上述第一方面所述的信息处理方法。A third aspect of this application provides an electronic device that may include a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it can implement the information processing method described in the first aspect above.

本申请实施例的第四方面提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤实现上述第一方面所述的信息处理方法。A fourth aspect of this application provides a computer-readable storage medium storing computer instructions that, when executed on an electronic device, cause the electronic device to perform the aforementioned related method steps to implement the information processing method described in the first aspect.

本申请实施例的第五方面提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述第一方面所述的信息处理方法。The fifth aspect of this application provides a computer program product that, when run on a computer, causes the computer to perform the aforementioned related steps to implement the information processing method described in the first aspect.

可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。It is understood that the beneficial effects of the second to fifth aspects mentioned above can be found in the relevant descriptions in the first aspect mentioned above, and will not be repeated here.

附图说明Attached Figure Description

图1是本申请实施例提供的一种信息处理方法的整体流程示意图;Figure 1 is a schematic flowchart of an information processing method provided in an embodiment of this application;

图2是本申请实施例提供的一种信息处理方法所适用的电子设备的结构示意图;Figure 2 is a schematic diagram of the structure of an electronic device to which an information processing method provided in an embodiment of this application is applicable;

图3是本申请实施例提供的一种目标联系人识别以及声音特征检测的示意图;Figure 3 is a schematic diagram of target contact identification and voice feature detection provided in an embodiment of this application;

图4是本申请实施例提供的一种声音数据授权管理流程的示意图;Figure 4 is a schematic diagram of a sound data authorization management process provided in an embodiment of this application;

图5是本申请实施例提供的一种在进行声音数据授权的过程中电子设备操作页面的示意图;Figure 5 is a schematic diagram of an electronic device operation page during the process of voice data authorization, provided in an embodiment of this application;

图6是本申请实施例提供的另一种声音数据授权管理流程的示意图;Figure 6 is a schematic diagram of another voice data authorization management process provided in an embodiment of this application;

图7是本申请实施例提供的另一种在进行声音数据授权的过程中电子设备操作页面的示意图;Figure 7 is a schematic diagram of another electronic device operation page during the process of voice data authorization provided in an embodiment of this application;

图8是本申请实施例提供的又一种声音数据授权管理流程的示意图;Figure 8 is a schematic diagram of another voice data authorization management process provided in an embodiment of this application;

图9是本申请实施例提供的又一种在进行声音数据授权的过程中电子设备操作页面的示意图;Figure 9 is a schematic diagram of the operation page of an electronic device during the process of authorizing voice data, provided in another embodiment of this application.

图10是本申请实施例提供的一种声音数据处理流程的示意图;Figure 10 is a schematic diagram of a sound data processing flow provided in an embodiment of this application;

图11是本申请实施例提供的一种打标检索的示意图;Figure 11 is a schematic diagram of a tagging retrieval method provided in an embodiment of this application;

图12是本申请实施例提供的一种翻页检索的示意图;Figure 12 is a schematic diagram of a page-turning retrieval provided in an embodiment of this application;

图13是本申请实施例提供的一种搜索检索的示意图;Figure 13 is a schematic diagram of a search retrieval method provided in an embodiment of this application;

图14是本申请实施例提供的一种语音音频数据流抓取的示意图;Figure 14 is a schematic diagram of a voice audio data stream capture method provided in an embodiment of this application;

图15是本申请实施例提供的另一种语音音频数据流抓取的示意图;Figure 15 is a schematic diagram of another voice audio data stream capture provided in an embodiment of this application;

图16是本申请实施例提供的一种目标人声识别筛选的示意图;Figure 16 is a schematic diagram of a target human voice recognition and screening method provided in an embodiment of this application;

图17是本申请实施例提供的一种声音特征提取与存储的示意图;Figure 17 is a schematic diagram of sound feature extraction and storage provided in an embodiment of this application;

图18是本申请实施例提供的一种信息处理方法的示意图;Figure 18 is a schematic diagram of an information processing method provided in an embodiment of this application;

图19是本申请实施例提供的另一种信息处理方法的示意图;Figure 19 is a schematic diagram of another information processing method provided in an embodiment of this application;

图20是本申请实施例提供的一种生成播报控制策略的示意图;Figure 20 is a schematic diagram of a broadcast generation control strategy provided in an embodiment of this application;

图21是本申请实施例提供的一种信息来源判断流程的示意图;Figure 21 is a schematic diagram of an information source determination process provided in an embodiment of this application;

图22是本申请实施例提供的一种联系人来源判断流程的示意图;Figure 22 is a schematic diagram of a contact source determination process provided in an embodiment of this application;

图23是本申请实施例提供的一种信息精简判断流程的示意图;Figure 23 is a schematic diagram of an information simplification judgment process provided in an embodiment of this application;

图24是本申请实施例提供的一种非文本信息处理流程的示意图;Figure 24 is a schematic diagram of a non-text information processing flow provided in an embodiment of this application;

图25是本申请实施例提供的一种非文本信息处理的示意图;Figure 25 is a schematic diagram of a non-text information processing method provided in an embodiment of this application;

图26是本申请实施例提供的一种即时信息合并处理流程的示意图;Figure 26 is a schematic diagram of a real-time information merging process provided in an embodiment of this application;

图27是本申请实施例提供的一种即时信息合并处理的示意图;Figure 27 is a schematic diagram of a real-time information merging process provided in an embodiment of this application;

图28是本申请实施例提供的另一种即时信息合并处理的示意图;Figure 28 is a schematic diagram of another real-time information merging process provided in an embodiment of this application;

图29是本申请实施例提供的一种信息检索与摘要汇总处理流程的示意图;Figure 29 is a schematic diagram of an information retrieval and summary processing flow provided in an embodiment of this application;

图30是本申请实施例提供的一种信息检索与摘要汇总处理的示意图;Figure 30 is a schematic diagram of an information retrieval and summary processing provided in an embodiment of this application;

图31是本申请实施例提供的一种信息处理装置的示意图。Figure 31 is a schematic diagram of an information processing device provided in an embodiment of this application.

具体实施方式Detailed Implementation

需要说明的是,本申请实施例中,“示例性地”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性地”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性地”或者“例如”等词旨在以具体方式呈现相关概念。It should be noted that in the embodiments of this application, the words "exemplarily" or "for example" are used to indicate examples, illustrations, or explanations. Any embodiment or design scheme described as "exemplarily" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design schemes. Specifically, the use of the words "exemplarily" or "for example" is intended to present the relevant concepts in a specific manner.

本申请实施例描述的业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The business scenarios described in the embodiments of this application are for the purpose of more clearly illustrating the technical solutions of the embodiments of this application, and do not constitute a limitation on the technical solutions provided in the embodiments of this application. As those skilled in the art will know, with the emergence of new business scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。In this application embodiment, "at least one" refers to one or more, and "more than one" refers to two or more. "And/or" describes the relationship between related objects, indicating that three relationships can exist. For example, A and/or B can represent: A alone, A and B simultaneously, or B alone, where A and B can be singular or plural. The character "/" generally indicates that the preceding and following related objects are in an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c can be single or multiple.

本申请实施例提供的一种信息处理方法、装置和电子设备中所涉及到的步骤仅仅作为示例,并非所有的步骤均是必须执行的步骤,或者并非各个步骤中的内容均是必选的,在使用过程中可以根据需要酌情增加或减少。本申请实施例中同一个步骤或者具有相同功能的步骤或者内容在不同实施例之间可以互相参考借鉴。The steps involved in the information processing method, apparatus, and electronic device provided in this application are merely examples. Not all steps are mandatory, nor are all contents within each step required. They can be added or removed as needed during use. The same step or steps or contents with the same function in the embodiments of this application can be referenced and learned from each other in different embodiments.

通常,电子设备语音播报接收到的信息可以包括播报来电信息和播报其他通知消息等。对于播报来电信息,电子设备通常默认开启该功能。并且,在电子设备连接耳机,或者与汽车连接时(例如,用户处于驾驶状态),当接收到来电信息,电子设备将通过语音直接播报该来电信息。示例性地,在接收到来电信息时,电子设备可以使用系统音源播报“来自Tom的通话,要接听吗?”用户可以与电子设备交互,从而确定接听该来电或挂掉该来电。对于其他通知消息,用户可以通过在电子设备上的设置,选择性地开启电子设备播报特定类型或特定APP的通知消息的功能。示例性地,用户可以在电子设备上进行设置,开启电子设备播报各类社交应用接收到的消息的功能。对于系统消息,通过设置可以关闭播报该类型消息的功能。这样,当产生系统消息时,电子设备不会对其进行语音播报;而对于社交应用接收到的消息,电子设备可以在接收到相关消息时,直接通过系统音源进行播报。播报时,电子设备还可以对信息的接收来源进行提示。例如,电子设备接收到某一社交应用,例如社交应用APP1中的联系人Tom发来的消息时,其播报内容可以是“来自应用APP1 Tom的消息:周末一起吃饭怎么样?”对于消息中存在链接等内容时,电子设备播报的内容可以是“来自应用APP1,Tom-第一公司-应用工程师,我给你找了相关的信息,这是网址:https://happy.valley……”此外,对于内容较长的消息,电子设备在进行播报时通常只对长文本的开头部分进行播报,不能对文本内容进行概括性播报。当接收到的多条消息均为长文本时,电子设备也只会对每条长文本的开头部分进行播报,导致用户获取消息内容的效率较低。Typically, the information received via voice broadcast on electronic devices can include incoming call information and other notification messages. For incoming call information broadcasting, this function is usually enabled by default on electronic devices. Furthermore, when the electronic device is connected to headphones or a car (e.g., while the user is driving), it will directly broadcast the incoming call information via voice. For example, upon receiving an incoming call, the electronic device can use the system audio source to announce, "A call from Tom, do you want to answer?" The user can interact with the electronic device to decide whether to answer or hang up. For other notification messages, users can selectively enable the function to broadcast specific types or specific app notification messages through the settings on their electronic device. For example, users can configure their electronic device to enable the function to broadcast messages received from various social applications. For system messages, the function to broadcast this type of message can be disabled through settings. This way, the electronic device will not broadcast system messages; however, for messages received from social applications, the electronic device can directly broadcast them via the system audio source upon receipt. During broadcasting, the electronic device can also indicate the source of the information. For example, when an electronic device receives a message from Tom, a contact in social application APP1, its announcement might read, "Message from Tom in APP1: How about we have dinner together this weekend?" If the message contains links, the announcement might read, "From APP1, Tom - First Company - Application Engineer, I found some relevant information for you, here's the URL: https://happy.valley..." Furthermore, for longer messages, the electronic device typically only announces the beginning of the long text, failing to provide a summary of the content. When multiple messages are long, the device will also only announce the beginning of each long text, resulting in low efficiency for the user in obtaining the message content.

针对上述问题,本申请实施例提供了一种信息处理方法、装置和电子设备。一方面,电子设备在接收到信息时,可以首先确定发送信息的目标联系人。这样,电子设备可以采用与该目标联系人真实声音相同或相似的声音对接收到的信息进行语音播报,从而方便用户根据播报信息所使用的声音来快速判断发送信息的联系人是谁,从用户听觉层面建立起对该目标联系人的熟悉感。另一方面,对于接收到的信息,电子设备可以按照一定的播报控制策略对其进行处理。例如,电子设备可以将接收到的非文本形式的信息处理成文本信息;或者,对于较多内容的文本信息,电子设备可以对其内容进行概括,简化后续需要播报的信息内容。这样,电子设备可以仅播报按照相应策略处理得到的信息,无需对冗余的文本信息或非文本信息进行直接播报,提高了用户获取信息内容的效率和及时性。To address the aforementioned problems, embodiments of this application provide an information processing method, apparatus, and electronic device. On one hand, when the electronic device receives information, it can first determine the target contact for sending the information. In this way, the electronic device can use a voice identical or similar to the target contact's real voice to broadcast the received information, allowing the user to quickly identify the sender based on the voice used in the broadcast, thus establishing familiarity with the target contact through auditory perception. On the other hand, the electronic device can process the received information according to a specific broadcast control strategy. For example, the electronic device can process received non-text information into text information; or, for text information with substantial content, the electronic device can summarize its content, simplifying the information to be broadcast subsequently. In this way, the electronic device can broadcast only the information processed according to the corresponding strategy, without directly broadcasting redundant text or non-text information, improving the efficiency and timeliness of the user's information acquisition.

如图1所示,是本申请实施例提供的一种信息处理方法的整体流程示意图。按照图1所示的流程,电子设备可以基于用户指令或在满足自动播报条件时,执行本申请实施例提供的信息处理方法,通过确定目标联系人,从而按照一定的播报控制策略采用与该目标联系人的真实声音相同或相似的声音对信息进行语音播报。如图1所示,在接收到用户指令或满足自动播报条件时,电子设备可以首先确定目标联系人,该目标联系人也即是发送信息的联系人。具体地,电子设备可以基于用户指令或通过对接收到的信息进行分析,获取联系人信息。例如,联系人ID、备注名称、昵称、头像等信息。电子设备通过对上述各种类型的联系人信息进行处理,可以确定发送信息的目标联系人。然后,如图1所示,电子设备可以检测系统中是否存在目标联系人的声音特征,以及当前是否已经获得目标联系人的授权,允许使用其声音特征。如果系统中不存在目标联系人的声音特征,则电子设备可以在获得目标联系人授权的前提下,检索目标联系人的语音消息,并从检索到的语音消息中提取出该目标联系人的声音特征,用于声音合成。或者,如果系统中存在目标联系人的声音数据但并未获得目标联系人使用其声音的授权,则电子设备可以执行权限控制步骤,请求获得目标联系人使用其声音的授权。在获得目标联系人的授权后,电子设备可以对系统中已存储的目标联系人的声音数据进行处理,例如检索目标联系人的语音消息,并基于检索到的语音消息进行声音特征的提取,从而得到目标联系人的声音特征,用于合成与目标联系人真实声音相同或相似的目标声音。在此基础上,如图1所示,电子设备可以执行确定播报控制策略的步骤,通过考虑前一条信息的播报和本条信息的来源、用户与目标联系人的熟悉程度、信息的类型和复杂度、用户指令的播报需求等,生成播报控制策略,并按照相应策略进行语音播报。在进行语音播报时,电子设备可以采用与目标联系人真实声音相同或相似的目标声音播报按照相应策略处理后的信息。需要说明的是,声音特征包括多个维度的特征,例如音色、音调、语气、韵律等,因此与目标联系人真实声音相同或相似的目标声音可以是指音色、音调、语气、韵律等其中一项或多项特征相同或相似的声音。Figure 1 shows a schematic diagram of the overall flow of an information processing method provided in this application embodiment. According to the flow shown in Figure 1, the electronic device can execute the information processing method provided in this application embodiment based on user instructions or when automatic broadcasting conditions are met. By determining the target contact, the device uses a voice that is the same as or similar to the target contact's real voice to broadcast the information according to a certain broadcasting control strategy. As shown in Figure 1, when a user instruction is received or the automatic broadcasting conditions are met, the electronic device can first determine the target contact, which is the contact who sent the information. Specifically, the electronic device can obtain contact information based on user instructions or by analyzing the received information. For example, contact ID, nickname, avatar, etc. By processing the above-mentioned various types of contact information, the electronic device can determine the target contact to send the information. Then, as shown in Figure 1, the electronic device can detect whether the target contact's voice characteristics exist in the system, and whether the target contact has been authorized to use their voice characteristics. If the target contact's voice characteristics do not exist in the system, the electronic device can, with the authorization of the target contact, retrieve the target contact's voice messages and extract the target contact's voice characteristics from the retrieved voice messages for voice synthesis. Alternatively, if the system contains the target contact's voice data but the target contact has not authorized the use of their voice, the electronic device can execute an access control step to request authorization from the target contact to use their voice. After obtaining authorization, the electronic device can process the target contact's voice data stored in the system, such as retrieving the target contact's voice messages and extracting voice features based on the retrieved messages. This extracted voice features are then used to synthesize a target voice that is identical or similar to the target contact's real voice. Based on this, as shown in Figure 1, the electronic device can execute a step to determine a broadcast control strategy. By considering the source of the previous message and the current message, the user's familiarity with the target contact, the type and complexity of the information, and the user's broadcast requirements, a broadcast control strategy is generated, and voice broadcast is performed according to the corresponding strategy. During voice broadcast, the electronic device can use a target voice that is identical or similar to the target contact's real voice to broadcast the information processed according to the corresponding strategy. It should be noted that voice features include features in multiple dimensions, such as timbre, pitch, tone, and rhythm. Therefore, a target voice that is the same as or similar to the real voice of the target contact can refer to a voice that is the same as or similar to one or more of the features such as timbre, pitch, tone, and rhythm.

作为应用本申请实施例的信息处理方法的一种示例,用户正在开车/做家务/跑步时,收到了某一社交应用中某个联系人发送的信息。用户可以主动问询电子设备收到了什么信息,并且请电子设备进行播报。或者,在用户预先设置自动播报接收到的信息的情况下,电子设备在接收到某个联系人发送的信息时,采用本申请实施例提供的方法对信息进行处理,自动对接收到的信息进行播报。电子设备语音播报的信息可以是采用本申请实施例提供的方法中的播报控制策略对原始信息进行处理后的信息。在播报时,电子设备可以采用与发送该信息的目标联系人的真实声音相同或相似的目标声音进行播报。上述目标声音可以是指基于目标联系人的声音特征合成得到的声音,合成得到的目标声音与目标联系人的真实声音具有一定的相似性。在合成目标声音的过程中,电子设备可以根据实际需要确定目标声音与目标联系人的真实声音之间的相似度。例如,目标声音和目标联系人的真实声音的相似度可以是70%或者90%等,本申请实施例对此不作限定。As an example of the information processing method applied in this application, a user receives a message from a contact in a social application while driving/doing housework/running. The user can actively ask the electronic device what message they received and request the device to announce it. Alternatively, if the user has pre-set automatic announcement of received information, the electronic device, upon receiving a message from a contact, processes the information using the method provided in this application and automatically announces the received information. The information announced by the electronic device can be the information after processing the original information using the announcement control strategy provided in this application. During announcement, the electronic device can use a target voice that is the same as or similar to the real voice of the target contact who sent the message. The target voice can refer to a voice synthesized based on the voice features of the target contact, and the synthesized target voice has a certain similarity to the real voice of the target contact. During the synthesis of the target voice, the electronic device can determine the similarity between the target voice and the real voice of the target contact according to actual needs. For example, the similarity between the target voice and the real voice of the target contact can be 70% or 90%, etc., which is not limited in this application.

本申请实施例提供的信息处理方法可以应用于电子设备。该电子设备可以是手机、平板电脑、智能穿戴设备、车载移动设备等等。本申请实施例对电子设备的类型不作限定。The information processing method provided in this application can be applied to electronic devices. These electronic devices can be mobile phones, tablets, smart wearable devices, in-vehicle mobile devices, etc. This application does not limit the type of electronic device.

示例性地,图2示出了一种电子设备200的结构示意图。上述电子设备的结构可以参考图2中电子设备200的结构。For example, Figure 2 shows a schematic diagram of the structure of an electronic device 200. The structure of the above-mentioned electronic device can be referred to the structure of the electronic device 200 in Figure 2.

如图2所示,电子设备200可以包括处理器210、外部存储器接口220、内部存储器221、通用串行总线(universal serial bus,USB)接口230、充电管理模块240、电源管理模块241、电池242、天线1、天线2、移动通信模块250、无线通信模块260、音频模块270、扬声器270A、受话器270B、麦克风270C、耳机接口270D、传感器模块280、按键290、马达291、指示器292、摄像头293、显示屏294,以及用户标识模块(subscriber identification module,SIM)卡接口295等。其中,传感器模块280可以包括压力传感器280A、陀螺仪传感器280B、气压传感器280C、磁传感器280D、加速度传感器280E、距离传感器280F、接近光传感器280G、指纹传感器280H、温度传感器280J、触摸传感器280K、环境光传感器280L、骨传导传感器280M等。As shown in Figure 2, the electronic device 200 may include a processor 210, an external memory interface 220, an internal memory 221, a universal serial bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, a headphone jack 270D, a sensor module 280, buttons 290, a motor 291, an indicator 292, a camera 293, a display screen 294, and a subscriber identification module (SIM) card interface 295, etc. The sensor module 280 may include a pressure sensor 280A, a gyroscope sensor 280B, a barometric pressure sensor 280C, a magnetic sensor 280D, an accelerometer sensor 280E, a distance sensor 280F, a proximity light sensor 280G, a fingerprint sensor 280H, a temperature sensor 280J, a touch sensor 280K, an ambient light sensor 280L, a bone conduction sensor 280M, etc.

可以理解的是,本申请实施例示意的结构并不构成对电子设备200的具体限定。在本申请一些实施例中,电子设备200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It is understood that the structures illustrated in the embodiments of this application do not constitute a specific limitation on the electronic device 200. In some embodiments of this application, the electronic device 200 may include more or fewer components than illustrated, or combine some components, or split some components, or have different component arrangements. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

处理器210可以包括一个或多个处理单元。例如,处理器210可以包括应用处理器(application processor,AP)、调制解调处理器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器,和/或,神经网络处理器(neural-network processing unit,NPU)等。不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。Processor 210 may include one or more processing units. For example, processor 210 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural network processing unit (NPU). Different processing units may be independent devices or integrated into one or more processors.

其中,控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。示例性地,控制器可以在电子设备200接收到某个联系人通过某一应用程序,例如社交类应用发送的信息时,对信息进行处理,确定出发送该信息的目标联系人。The controller can generate operation control signals based on the instruction opcode and timing signals to control the fetching and execution of instructions. For example, when the electronic device 200 receives a message from a contact through an application, such as a social networking application, the controller can process the message and determine the target contact who sent it.

处理器210中还可以设置存储器,用于存储指令和数据。示例性地,存储器可以用于在电子设备200本地存储联系人的声音数据。这样,当需要采用与某个联系人相同或相似的声音对信息进行播报时,电子设备200可以直接在本地的存储器中进行检索,获得目标联系人的声音特征。The processor 210 may also include a memory for storing instructions and data. For example, the memory may be used to locally store the voice data of contacts within the electronic device 200. Thus, when information needs to be broadcast using a voice identical or similar to that of a particular contact, the electronic device 200 can directly retrieve the voice characteristics of the target contact from its local memory.

电子设备200可以通过音频模块270、扬声器270A、受话器270B、麦克风270C、耳机接口270D,以及应用处理器等实现音频功能。例如音乐播放、录音、信息的语音播报等。Electronic device 200 can implement audio functions through audio module 270, speaker 270A, receiver 270B, microphone 270C, headphone jack 270D, and application processor. Examples include music playback, recording, and voice broadcasting of information.

扬声器270A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备200可以通过扬声器270A收听音乐,或收听免提通话。在本申请实施例中,电子设备200可以通过扬声器270A将处理后的信息以语音形式播报给用户。The speaker 270A, also known as a "loudspeaker," is used to convert audio electrical signals into sound signals. The electronic device 200 can listen to music or make hands-free calls through the speaker 270A. In this embodiment, the electronic device 200 can broadcast processed information to the user in voice form through the speaker 270A.

受话器270B和/或麦克风270C可以用于接收用户语音。例如,用户可以主动向电子设备200询问当前接收到了何种信息,并请电子设备200进行播报。在上述用户与电子设备200的交互过程中,用户的指令可以以语音的形式传递给电子设备200,电子设备200可以通过受话器270B和/或麦克风270C接收用户的语音,并将其转换为可供处理器210处理的信号。The receiver 270B and/or microphone 270C can be used to receive user voice. For example, a user can actively ask the electronic device 200 what information has been received and request the electronic device 200 to broadcast it. During the interaction between the user and the electronic device 200, the user's instructions can be transmitted to the electronic device 200 in the form of voice. The electronic device 200 can receive the user's voice through the receiver 270B and/or microphone 270C and convert it into a signal that can be processed by the processor 210.

耳机接口270D可以用于连接耳机,耳机接口270D连接的耳机可以是有线耳机,也可以是无线耳机。在将耳机与耳机接口270D连接后,上述扬声器270A、受话器270B、麦克风270C可实现的功能均可以通过耳机实现。The headphone jack 270D can be used to connect headphones, which can be wired or wireless. After connecting headphones to the headphone jack 270D, the functions of the speaker 270A, receiver 270B, and microphone 270C can all be achieved through the headphones.

本申请实施例提供的信息处理方法可以在具有上述硬件结构的电子设备上实现。The information processing method provided in this application embodiment can be implemented on an electronic device having the above-described hardware structure.

在本申请实施例中,当电子设备接收到信息,电子设备可以根据信息中携带的联系人信息,确定发送该信息的目标联系人。示例性地,电子设备可以根据接收到的信息中携带的联系人ID、备注名、昵称,和/或头像信息等确定目标联系人。其中,电子设备接收的信息可以是第一信息。响应于接收的第一信息,电子设备可以确定发送第一信息的目标联系人。In this embodiment, when an electronic device receives information, it can determine the target contact who sent the information based on the contact information carried in the information. For example, the electronic device can determine the target contact based on the contact ID, nickname, and/or profile picture information carried in the received information. The information received by the electronic device can be first information. In response to the received first information, the electronic device can determine the target contact who sent the first information.

为了实现采用与目标联系人的真实声音相同或相似的声音对接收到的信息进行语音播报,电子设备需要在确定发送信息的目标联系人后,获取该目标联系人的声音特征,用于声音合成,得到目标声音,该目标声音也即是与目标联系人的真实声音相同或相似的声音。在此过程中,电子设备需要根据多种类型的联系人信息对系统中已存储的声音特征进行识别,确认相关声音特征是否属于目标联系人。To deliver received information via voice using a voice identical or similar to the target contact's real voice, the electronic device needs to identify the target contact after determining their identity. This target voice is then used for voice synthesis to produce the desired voice, which is identical or similar to the target contact's real voice. During this process, the electronic device needs to identify stored voice features based on various types of contact information to confirm whether the relevant voice features belong to the target contact.

如图3所示,是本申请实施例提供的一种目标联系人识别以及声音特征检测的示意图,图3示出了根据多种类型的联系人信息对系统中已存储的声音特征进行检测的过程。Figure 3 is a schematic diagram of target contact identification and voice feature detection provided in an embodiment of this application. Figure 3 shows the process of detecting voice features stored in the system based on various types of contact information.

在图3所示的流程中,联系人信息可以包括联系人ID、备注名、昵称,和头像信息等,电子设备可以通过对系统中的联系人ID、备注名、昵称,和头像信息等能够标记和识别目标联系人的多种信息进行顺序判断,确定系统中的声音特征与目标联系人的声音特征的一致性。In the process shown in Figure 3, the contact information may include contact ID, nickname, and avatar information. Electronic devices can determine the consistency between the voice characteristics in the system and the voice characteristics of the target contact by sequentially judging various information that can mark and identify the target contact, such as contact ID, nickname, and avatar information.

具体地,电子设备在接收到信息后,可以获取信息中携带的联系人信息,例如图3中所示的联系人ID、备注名、昵称,和头像信息等。电子设备首先可以检测系统中是否存在目标联系人ID对应的声音特征。如果存在,电子设备可以判断当前是否获得了目标联系人使用其声音特征的授权。电子设备只有在获得目标联系人授权的情况下,才可以使用其声音特征,用于对接收到的信息的语音播报。否则,电子设备应当尝试请求目标联系人进行授权。Specifically, after receiving information, the electronic device can obtain the contact information carried in the information, such as the contact ID, nickname, and profile picture information as shown in Figure 3. The electronic device first detects whether the voice signature corresponding to the target contact ID exists in the system. If it does, the electronic device can determine whether it has obtained authorization from the target contact to use their voice signature. The electronic device can only use the target contact's voice signature for voice broadcasting of the received information if it has obtained authorization from the target contact. Otherwise, the electronic device should attempt to request authorization from the target contact.

如果系统中并不存在目标联系人ID对应的声音特征,电子设备可以继续检测系统中是否存在目标联系人的备注名,是否存在目标联系人的昵称,是否存在目标联系人的头像信息;以及是否存在上述备注名对应的声音特征,是否存在目标联系人的昵称对应的声音特征,是否存在目标联系人的头像信息对应的声音特征。If the system does not have the voice signature corresponding to the target contact ID, the electronic device can continue to detect whether the system has the target contact's nickname, the target contact's avatar information, and whether the system has the voice signature corresponding to the above-mentioned nickname, the target contact's avatar information, and the target contact's avatar information.

如图3所示,如果系统中存在目标联系人的备注名,电子设备可以检测系统中是否存在该备注名对应的声音特征。如果系统中存在该备注名对应的声音特征,电子设备还需要确认该备注名是否被修改,当前的备注名是否能够唯一地指代目标联系人。若是,则电子设备可以确定当前是否获得了目标联系人使用其声音特征的授权,电子设备可以根据判断结果决定是否使用其声音特征合成得到与目标联系人的真实声音相同或相似的目标声音,并采用目标声音对接收到的信息的语音播报;或者,在获取目标联系人的授权后,合成目标声音并采用目标声音对信息进行播报。As shown in Figure 3, if a nickname for the target contact exists in the system, the electronic device can detect whether a corresponding voice feature exists in the system. If the voice feature exists, the electronic device also needs to confirm whether the nickname has been modified and whether the current nickname can uniquely identify the target contact. If so, the electronic device can determine whether it has obtained authorization from the target contact to use their voice feature. Based on the determination result, the electronic device can decide whether to use the voice feature to synthesize a target voice that is the same as or similar to the target contact's real voice and use the target voice to broadcast the received information; or, after obtaining authorization from the target contact, synthesize the target voice and use the target voice to broadcast the information.

如果系统中并不存在目标联系人的备注名,或者虽然存在目标联系人的备注名但并不存在该备注名对应的声音特征,则电子设备可以检测系统中是否存在目标联系人的昵称以及是否存在该昵称对应的声音特征。如果系统中也不存在目标联系人的昵称,或者虽然存在目标联系人的昵称但并不存在该昵称对应的声音特征,则电子设备可以继续检测系统中是否存在目标联系人的头像信息以及是否存在该头像信息对应的声音特征。If the target contact's nickname does not exist in the system, or if the nickname exists but the corresponding voice signature does not exist, the electronic device can detect if the target contact's nickname and its corresponding voice signature exist in the system. If the target contact's nickname also does not exist in the system, or if the nickname exists but the corresponding voice signature does not exist, the electronic device can further detect if the target contact's profile picture and its corresponding voice signature exist in the system.

如图3所示,在确认系统中存在目标联系人的声音特征后,电子设备可以在获得目标联系人授权的情况下,使用目标联系人的声音特征,用于合成与该目标联系人的真实声音相同或相似的目标声音。As shown in Figure 3, after confirming the existence of the target contact's voice characteristics in the system, the electronic device can use the target contact's voice characteristics to synthesize a target voice that is the same as or similar to the target contact's real voice, with the authorization of the target contact.

如图4和图5所示,是本申请实施例提供的一种声音数据授权管理的示意图。其中,图4示出了声音数据授权管理流程的示例,图5示出了在进行声音数据授权的过程中电子设备操作页面的示例。按照图4和图5所示的授权管理流程,用户可以通过相关设置,针对某个具体应用进行本方案使用的授权。在完成授权后,相关应用接收到的信息能够按照本方案提供的各个步骤进行处理,并使用目标声音对其进行语音播报。示例性地,按照图4和图5所示的授权管理流程,用户可以将声音数据授权给应用程序使用,例如授权给图5中所示的社交应用A使用,或者授权给短信这一应用使用。这样,当该应用程序接收到用户发送的信息时,可以使用该用户的声音特征合成得到与其相同或相似的目标声音,并采用目标声音对接收到的信息进行语音播报。上述社交应用A可以是即时通信应用、聊天应用或其他具备通信功能的应用。Figures 4 and 5 illustrate a schematic diagram of voice data authorization management provided in an embodiment of this application. Figure 4 shows an example of the voice data authorization management process, and Figure 5 shows an example of the electronic device's operation page during the voice data authorization process. Following the authorization management process shown in Figures 4 and 5, users can authorize the use of this solution for a specific application through relevant settings. After authorization, the information received by the relevant application can be processed according to the steps provided in this solution and broadcast using the target voice. For example, following the authorization management process shown in Figures 4 and 5, a user can authorize the use of voice data by an application, such as authorizing the use of social application A shown in Figure 5, or authorizing the use of the SMS application. Thus, when the application receives a message sent by the user, it can synthesize a target voice that is the same as or similar to the user's voice features and broadcast the received message using the target voice. The aforementioned social application A can be an instant messaging application, a chat application, or other applications with communication functions.

具体地,参见图5,在一种示例中,上述采用与发送信息的联系人的真实声音相同或相似的声音来播报信息的功能可以被称为声音信使功能。在使用声音信使功能时,用户可以在电子设备上操作,进入相应的设置页面,例如图5中的(a)所示的声音信使设置页,并通过打开声音信使功能开关511开启该功能。声音信使功能开启后,如图5中的(a)所示的,用户可以通过打开开关512,选择在电子设备连接耳机时,采用该功能自动播报接收到的信息。通过点击声音数据授权管理开关513,电子设备可以跳转至如图5中的(b)所示的授权管理页面,该页面中可以显示存在过授权记录的各个应用,如图5中的(b)中显示的社交应用A和电话应用。当用户点击社交应用A对应的控件521后,电子设备跳转至图5中的(c)所示的页面,该页面中显示有用户授权社交应用A使用声音数据的授权记录,例如授权记录1、授权记录2和授权记录3。用户可以在图5中的(c)所示的页面进行操作,例如点击授权记录2对应的控件531,查看本条授权记录的具体情况。如图5中的(d)所示,可以是授权记录2对应的具体情况,在本次授权中,用户打开了“允许一年内可以使用我的声音”的开关541,这表示在完成本次授权后,用户将允许社交应用A在一年内使用其声音数据用于实现声音信使功能。上述被授权使用的声音数据可以是用户的声音特征,也可称为声音特征数据。Specifically, referring to Figure 5, in one example, the function of broadcasting information using a voice that is the same as or similar to the real voice of the contact sending the information can be called a voice messenger function. When using the voice messenger function, the user can operate the electronic device to access the corresponding settings page, such as the voice messenger settings page shown in Figure 5(a), and activate the function by turning on the voice messenger function switch 511. After the voice messenger function is activated, as shown in Figure 5(a), the user can select to use this function to automatically broadcast received information when the electronic device is connected to headphones by turning on switch 512. By clicking the voice data authorization management switch 513, the electronic device can jump to the authorization management page shown in Figure 5(b), which displays various applications with authorization records, such as social application A and the phone application shown in Figure 5(b). When the user clicks the control 521 corresponding to social application A, the electronic device jumps to the page shown in Figure 5(c), which displays the authorization records for the user to authorize social application A to use voice data, such as authorization record 1, authorization record 2, and authorization record 3. Users can operate on the page shown in Figure 5(c), for example, by clicking control 531 corresponding to authorization record 2 to view the details of this authorization record. As shown in Figure 5(d), the details of authorization record 2 can be seen. In this authorization, the user has turned on the switch 541 for "Allow the use of my voice for one year," which means that after completing this authorization, the user will allow social application A to use their voice data for the voice messenger function for one year. The voice data authorized for use can be the user's voice characteristics, also known as voice feature data.

如图6和图7所示,是本申请实施例提供的另一种声音数据授权管理的示意图,图6和图7示出了用户主动请求目标联系人授权使用其声音数据的具体流程。具体地,参照图6和图7,当用户希望获得使用某一联系人声音数据的授权时,用户可以在电子设备上进行操作,打开相关设置页面确认包含该联系人的应用。用户可以通过与该联系人的会话页面,将复制的请求授权的信息发送给该联系人,向该联系人请求获得使用其声音的授权。Figures 6 and 7 illustrate another method of voice data authorization management provided in this application embodiment. Figures 6 and 7 show the specific process of a user actively requesting authorization from a target contact to use their voice data. Specifically, referring to Figures 6 and 7, when a user wishes to obtain authorization to use the voice data of a certain contact, the user can operate on the electronic device, open the relevant settings page, and confirm the application containing that contact. The user can then send a copied authorization request to the contact through the conversation page with that contact, requesting authorization to use their voice.

具体地,参见图7,在一种示例中,当用户请求联系人进行声音数据的授权时,用户可以在如图7中的(a)所示的设置页面中进行操作,例如点击社交应用A对应的授权管理开关711,此时电子设备可以跳转至如图7中的(b)所示的授权管理页面,该页面中显示了如何进行声音数据授权请求的操作信息,例如该页面中显示有可通过点击复制授权链接并跳转至相应APP的信息721,用户可以根据页面上显示的信息进行操作,点击该信息721并复制得到相应链接。这样,电子设备可以自动调用对应的APP,例如如图7中的(d)所示,调用社交应用A,并自动将授权请求信息741发送给对方用户。Specifically, referring to Figure 7, in one example, when a user requests authorization for voice data from a contact, the user can operate on the settings page shown in Figure 7(a). For example, by clicking the authorization management switch 711 corresponding to social application A, the electronic device can jump to the authorization management page shown in Figure 7(b). This page displays operation information on how to make a voice data authorization request. For example, the page displays information 721 that allows clicking to copy the authorization link and jump to the corresponding APP. The user can operate according to the information displayed on the page, click the information 721, and copy the corresponding link. In this way, the electronic device can automatically call the corresponding APP, for example, as shown in Figure 7(d), calling social application A, and automatically sending the authorization request information 741 to the other user.

如图8和图9所示,是本申请实施例提供的又一种声音数据授权管理的示意图,图8和图9所示的授权管理流程是与图6和图7所示的授权管理流程相对应的。图6和图7示出了用户主动请求目标联系人授权使用其声音数据的具体流程,也即图6和图7示出的是用户端的相关操作流程;图8和图9则是示出了联系人端对用户发送的授权请求进行处理的流程,也即图8和图9示出的是联系人端在接收到用户端发送的授权请求信息后的相关操作流程。按照图8和图9所示的流程,联系人可以对用户发送的授权请求进行确认并选择需要授权的类型,从而对用户端使用其声音数据进行授权。Figures 8 and 9 illustrate another type of voice data authorization management provided in this application embodiment. The authorization management process shown in Figures 8 and 9 corresponds to the authorization management process shown in Figures 6 and 7. Figures 6 and 7 show the specific process of a user actively requesting authorization from a target contact to use their voice data; that is, Figures 6 and 7 show the relevant operation process on the user's end. Figures 8 and 9 show the process of the contact's end processing the authorization request sent by the user; that is, Figures 8 and 9 show the relevant operation process on the contact's end after receiving the authorization request information sent by the user. According to the process shown in Figures 8 and 9, the contact can confirm the authorization request sent by the user and select the type of authorization to be granted, thereby authorizing the user's end to use their voice data.

具体地,图9所示的是在用户点击接收到的授权请求信息后,电子设备自动跳转显示的授权设置页面。例如,图7中的(d)中的对方用户Tom点击用户发送的授权请求信息741后,本端电子设备可以自动打开如图9所示的授权设置页面。用户Tom可以通过选择其中一种授权类型,例如图9中所示的“允许一直可以使用我的声音”,并打开相应的开关911,即可以将自己的声音数据授权给对端用户,允许对端用户使用声音信使功能。这样,当Tom使用授权了的应用给对端用户发送信息,对端用户设备上的应用接收到该信息时,该应用可以自动获取用户Tom的声音特征,在合成与Tom本人的真实声音相同或相似的目标声音后,采用目标声音向对端用户播报Tom发送的信息。Specifically, Figure 9 shows the authorization settings page that the electronic device automatically redirects to after the user clicks on the received authorization request information. For example, after the other user Tom in Figure 7(d) clicks on the authorization request information 741 sent by the user, the local electronic device can automatically open the authorization settings page shown in Figure 9. User Tom can authorize the other user to use the voice messenger function by selecting one of the authorization types, such as "Allow me to always use my voice" as shown in Figure 9, and turning on the corresponding switch 911. In this way, when Tom uses the authorized application to send a message to the other user, when the application on the other user's device receives the message, the application can automatically obtain the user Tom's voice characteristics, synthesize a target voice that is the same as or similar to Tom's real voice, and then use the target voice to broadcast the message sent by Tom to the other user.

在完成目标联系人的识别并获得使用其声音特征的相关权限后,电子设备可以使用目标联系人的声音特征合成得到与该用户的真实声音相同或相似的目标声音,采用目标声音对目标联系人发送的信息进行语音播报。在实现上述目的的过程中,如图1所示,电子设备还需要获得目标联系人的声音特征。After identifying the target contact and obtaining the relevant permissions to use their voice features, the electronic device can use the target contact's voice features to synthesize a target voice that is the same as or similar to the user's real voice, and then use the target voice to broadcast the information sent by the target contact. As shown in Figure 1, in achieving the above objective, the electronic device also needs to obtain the target contact's voice features.

在本申请实施例的一种可能的实现方式中,电子设备可以通过检索目标联系人的语音消息,从中提取目标联系人的声音特征,通过声音合成或克隆的方式得到与目标联系人的真实声音相同或相似的目标声音,从而可以采用目标声音对接收到的信息进行语音播报。In one possible implementation of this application, the electronic device can retrieve the voice messages of the target contact, extract the voice features of the target contact, and obtain a target voice that is the same as or similar to the real voice of the target contact through voice synthesis or cloning, so that the received information can be broadcast using the target voice.

如图10所示,是本申请实施例提供的一种声音数据处理流程的示意图,图10示出了检索目标联系人的语音消息并提取得到目标联系人的声音特征的流程。图10所示的流程包括目标联系人语音消息检索、语音音频数据流抓取、目标联系人声音特征提取及存储等步骤。此外,在检索目标联系人的语音消息过程中,还可以对检索到的语音消息的时长进行判断,保证后续的语音音频数据流是从符合一定时长要求的语音消息中抓取得到的。在提取目标联系人的声音特征前,电子设备还可以对抓取到的语音音频数据流进行质量检测,确保符合质量要求的语音音频数据流才能够被用于后续的声音特征提取,保证合成得到的目标声音尽可能接近目标联系人的真实声音。Figure 10 illustrates a sound data processing flow provided in an embodiment of this application. Figure 10 shows the process of retrieving voice messages from a target contact and extracting the target contact's voice features. The flow shown in Figure 10 includes steps such as retrieving voice messages from the target contact, capturing voice audio data streams, extracting and storing the target contact's voice features. Furthermore, during the process of retrieving voice messages from the target contact, the duration of the retrieved voice messages can be determined to ensure that subsequent voice audio data streams are captured from voice messages that meet certain duration requirements. Before extracting the target contact's voice features, the electronic device can also perform quality checks on the captured voice audio data streams to ensure that only voice audio data streams meeting quality requirements can be used for subsequent voice feature extraction, ensuring that the synthesized target voice is as close as possible to the target contact's real voice.

在本申请实施例的一种可能的实现方式中,电子设备可以采用自动化检索的方式,得到目标联系人的语音消息。电子设备采用的自动化检索方式可以包括打标检索、翻页检索、搜索检索以及直接检索中的一种或多种。In one possible implementation of this application, the electronic device can use an automated retrieval method to obtain the voice message of the target contact. The automated retrieval method used by the electronic device may include one or more of the following: tag-based retrieval, page-turning retrieval, search retrieval, and direct retrieval.

其中,打标检索可以是一种电子设备基于检索要素,从标记信息数据库中检索到标记信息,进而根据标记信息快速定位检索到目标联系人的语音消息的检索方式。在电子设备每次接收到联系人通过应用程序发送的语音消息后,通过对语音消息进行后台打标,标记出联系人信息、消息长度、日期以及应用程序名称等信息并存储相应的标记信息后,电子设备可以将标记信息存储至数据库。在使用打标检索时,电子设备可以根据检索要素从标记信息数据库中确定出相应的标记信息,从而快速地检索到需要的语音消息。在一种示例中,打标检索时使用的检索要素可以是上述标点信息,该检索要素至少应当包括目标联系人的联系人ID和应用程序名称。Tagging retrieval is a method where electronic devices retrieve tagging information from a tagging information database based on search elements, and then quickly locate and retrieve voice messages from a target contact based on the tagging information. Each time an electronic device receives a voice message sent by a contact through an application, it tags the voice message in the background, marking information such as contact information, message length, date, and application name, and stores the corresponding tagging information in a database. When using tagging retrieval, the electronic device can determine the corresponding tagging information from the tagging information database based on search elements, thereby quickly retrieving the required voice message. In one example, the search elements used in tagging retrieval can be the aforementioned punctuation information, and these search elements should at least include the target contact's contact ID and application name.

如图11所示,是本申请实施例提供的一种打标检索的示意图,图11示出了电子设备采用打标检索的方式检索出目标联系人的语音消息的过程,以及构造标记信息数据库的过程。Figure 11 is a schematic diagram of a tagging retrieval method provided in an embodiment of this application. Figure 11 shows the process of an electronic device retrieving voice messages of a target contact using a tagging retrieval method, as well as the process of constructing a tagging information database.

在构造标记信息数据库的过程中,如图11所示,电子设备在接收到目标联系人通过应用程序发送的每一条语音消息时,可以识别发送该条语音消息的联系人,例如通过联系人ID来识别每条语音消息的发送人。通常,时长过短的语音消息对于构造标记信息数据库作用有限,因此电子设备可以对接收到的语音消息进行时长判断,仅仅处理持续时间超过一定时长,例如超过10秒的语音消息。需要说明的是,电子设备对接收到的语音消息进行时长判断的步骤,可以在识别发送语音消息的联系人这一步骤之前进行,也可以在该步骤之后进行。即,电子设备可以首先识别发送语音消息的联系人,再对该条语音消息的时长进行判断;或者,电子设备在接收到一条语音消息后,也可以首先判断该条语音消息的时长,当时长满足相应要求时,电子设备再对该条语音消息进行联系人识别。As shown in Figure 11, during the construction of the tag information database, when an electronic device receives each voice message sent by a target contact through an application, it can identify the contact who sent the voice message, for example, by using the contact ID. Typically, voice messages that are too short have limited use in constructing the tag information database. Therefore, the electronic device can determine the duration of received voice messages, only processing those exceeding a certain duration, such as 10 seconds. It should be noted that the step of determining the duration of received voice messages can be performed before or after identifying the contact who sent the voice message. That is, the electronic device can first identify the contact who sent the voice message and then determine the duration of the voice message; or, after receiving a voice message, the electronic device can first determine the duration of the voice message, and if the duration meets the corresponding requirements, the electronic device will then perform contact identification for that voice message.

如图11所示,电子设备可以获取接收到语音消息的时间,该时间可以作为该条语音消息的标记信息之一即接收时间。此外,标记信息还可以包括联系人信息、消息时长以及应用程序名称等。电子设备在确定相应的信息后,可以对该条语音消息进行打标,并将打标完成的标记消息存储至标记信息数据库。As shown in Figure 11, the electronic device can obtain the time when a voice message is received, which can be used as one of the tagging information for that voice message, namely the reception time. In addition, the tagging information may also include contact information, message duration, and application name, etc. After determining the relevant information, the electronic device can tag the voice message and store the tagged message in a tagging information database.

在本申请实施例的一种可能的实现方式中,如图11所示,电子设备使用打标检索的方式从标记信息数据库中检索标记信息,进而获得目标联系人的语音消息的过程,可以是在电子设备中并不存在目标联系人的声音特征的情况下进行的。当电子设备中已经存在目标联系人的声音特征时,电子设备可以直接采用该声音特征合成目标声音,进而采用目标声音对接收到的信息进行播报,无需重复执行声音特征提取的流程以及打标检索等各类语音消息的检索流程。当电子设备中不存在目标联系人的声音特征时,电子设备可以确定相应的检索要素。例如,检索要素可以包括联系人信息、消息时长、接收时间以及应用程序的名称等。图11中示出的检索要素包括联系人信息即Tom、消息时长即大于10秒、日期即最近一个月,应用程序名称为社交应用A,上述检索要素表示本次打标检索的目的,是从标记信息数据库中检索出相应的标记信息,然后根据标记信息为导航,快速定位查找到符合上述标记信息,也即满足最近一个月接收到的由联系人Tom通过社交应用A发送且消息持续时间大于10秒的语音消息。在采用打标检索的方式检索到符合上述要求的语音信息后,电子设备可以将其用于后续的特征提取过程;否则,本次打标检索失败。In one possible implementation of this application embodiment, as shown in FIG11, the process of an electronic device retrieving tag information from a tag information database using a tagging retrieval method to obtain a voice message from a target contact can be performed even when the voice characteristics of the target contact do not exist in the electronic device. When the voice characteristics of the target contact already exist in the electronic device, the electronic device can directly use the voice characteristics to synthesize the target voice and then use the target voice to broadcast the received information, without repeating the voice feature extraction process and the tagging retrieval process for various voice messages. When the voice characteristics of the target contact do not exist in the electronic device, the electronic device can determine the corresponding retrieval elements. For example, retrieval elements may include contact information, message duration, reception time, and application name. The retrieval elements shown in FIG11 include contact information (Tom), message duration (greater than 10 seconds), date (last month), and application name (social application A). The above retrieval elements indicate that the purpose of this tagging retrieval is to retrieve the corresponding tag information from the tag information database, and then use the tag information as navigation to quickly locate and find a voice message that matches the above tag information, that is, a message received in the last month sent by contact Tom through social application A with a message duration greater than 10 seconds. After retrieving voice information that meets the above requirements using the tagging retrieval method, the electronic device can use it for subsequent feature extraction processes; otherwise, the tagging retrieval fails.

在本申请实施例中,翻页检索可以是电子设备通过打开用户与目标联系人关联的第一页面,并从该第一页面中自主检索得到目标联系人的语音消息的一种检索方式。其中,与目标联系人关联的第一页面可以是电子设备上接收到信息的应用程序的会话页面或消息页面,例如第一页面可以是用户使用社交软件与目标联系人聊天时的聊天页面,该聊天页面也可称为聊天框。在一种示例中,第一页面还可以是历史会话页面。因此,在翻页检索的过程中,电子设备可以滑动会话页面,并从滑动后展示的历史会话页面中检索目标联系人的语音消息。In this embodiment, page-turning retrieval can be a method where an electronic device opens a first page associated with a target contact and retrieves the target contact's voice messages from that first page. The first page associated with the target contact can be a conversation page or message page of an application on the electronic device that receives the information. For example, the first page could be a chat page when a user chats with a target contact using social media; this chat page can also be called a chat box. In one example, the first page could also be a history conversation page. Therefore, during page-turning retrieval, the electronic device can slide the conversation page and retrieve the target contact's voice messages from the history conversation page displayed after sliding.

如图12所示,是本申请实施例提供的一种翻页检索的示意图,图12示出了电子设备采用翻页检索的方式从消息页面中检索出目标联系人的语音消息的过程。Figure 12 is a schematic diagram of a page-turning retrieval method provided in an embodiment of this application. Figure 12 shows the process by which an electronic device retrieves the voice messages of a target contact from a message page using a page-turning retrieval method.

与打标检索类似,翻页检索也可以是在电子设备中并不存在目标联系人的声音特征的情况下进行的。当电子设备中并不存在目标联系人的声音特征时,电子设备可以打开用户与目标联系人关联的消息页面。例如,电子设备打开用户与目标联系人的聊天框。上述电子设备打开消息页面的过程可以是通过模拟操作实现的。Similar to tag-based retrieval, pagination retrieval can also be performed when the target contact's voice signature is not present on the electronic device. When the target contact's voice signature is not present on the electronic device, the device can open the message page associated with the user and the target contact. For example, the electronic device can open the chat window between the user and the target contact. This process of the electronic device opening the message page can be achieved through simulation.

在打开用户与目标联系人的消息页面后,电子设备可以在当前消息页面中进行检索,判断是否存在由目标联系人发送的语音消息。如果当前消息页面中存在目标联系人发送的语音消息,电子设备还可以对该语音消息的时长进行判断,例如如图12所示,判断该条语音消息是否大于10秒。如果语音消息的时长大于10秒,电子设备可以将该条语音消息作为检索到的目标联系人的语音消息,用于后续的声音特征提取。如果当前消息页面中并不存在由目标联系人发送的语音消息,或者虽然存在由目标联系人发送的语音消息但该条语音消息的时长并不满足相关要求,例如消息时长小于10秒。此时,电子设备可以通过在消息页面进行翻页的形式,例如向上滑动页面,再次从消息页面中对目标联系人的语音消息进行检索。电子设备翻页后得到的消息页面可以看作是这一时刻的当前消息页面,电子设备可以按照与前述相同的方式,在翻页后的消息页面中检索目标联系人的语音消息。After opening the message page between the user and the target contact, the electronic device can search the current message page to determine if a voice message sent by the target contact exists. If a voice message from the target contact exists in the current message page, the electronic device can also determine the duration of the voice message, for example, as shown in Figure 12, whether the voice message is longer than 10 seconds. If the duration of the voice message is longer than 10 seconds, the electronic device can use this voice message as the retrieved voice message from the target contact for subsequent voice feature extraction. If no voice message from the target contact exists in the current message page, or although a voice message from the target contact exists, its duration does not meet the relevant requirements, such as being less than 10 seconds, the electronic device can flip through the message page, for example, by swiping up, to search for the target contact's voice message again. The message page obtained after flipping through the page can be regarded as the current message page at this moment, and the electronic device can search for the target contact's voice message in the flipped message page in the same way as described above.

在本申请实施例的一种可能的实现方式中,如图12所示,如果当前消息页面无法翻页,也即当前消息页面无法向上滑动,说明当前消息页面的所有消息已检索完成,电子设备可以从包含目标联系人的会话群组中检索该目标联系人的语音消息。上述会话群组可以是包含目标联系人的群聊。In one possible implementation of this application embodiment, as shown in FIG12, if the current message page cannot be scrolled up, it indicates that all messages on the current message page have been retrieved, and the electronic device can retrieve the voice messages of the target contact from the conversation group containing the target contact. The aforementioned conversation group can be a group chat containing the target contact.

如图12所示,电子设备可以打开包含目标联系人的会话群组,即群聊聊天框,并在会话群组的当前消息页面中进行检索,确定是否存在由目标联系人发送的语音消息。类似地,如果会话群组的当前消息页面中存在由目标联系人发送的语音消息,电子设备可以对语音消息的时长进行判断,将时长满足相关要求的语音消息作为后续进行声音特征提取的消息。否则,如果会话群组的当前消息页面中存在由目标联系人发送的语音消息,或者虽然存在由目标联系人发送的语音消息但该条语音消息的时长并不满足相关要求,此时电子设备可以通过翻页的形式,在新呈现的消息页面中继续进行检索。电子设备在会话群组中进行翻页的方式与前述介绍的在用户与目标联系人进行会话的页面中进行翻页的方式相同。如果在用户与目标联系人进行会话的页面中以及在存在目标联系人的会话群组中均未检索到由目标联系人发送的语音消息,则本次翻页检索失败。As shown in Figure 12, the electronic device can open a chat group containing the target contact (i.e., a group chat window) and search the current message page of the chat group to determine if a voice message sent by the target contact exists. Similarly, if a voice message sent by the target contact exists in the current message page of the chat group, the electronic device can determine the duration of the voice message and use voice messages whose duration meets the relevant requirements as messages for subsequent voice feature extraction. Otherwise, if a voice message sent by the target contact exists in the current message page of the chat group, or if a voice message sent by the target contact exists but its duration does not meet the relevant requirements, the electronic device can continue searching in the newly presented message page by paging. The way the electronic device paging in the chat group is the same as the way it paging in the page where the user is having a conversation with the target contact, as described above. If no voice message sent by the target contact is found in the page where the user is having a conversation with the target contact or in the chat group containing the target contact, the current paging search fails.

在本申请实施例中,打标检索与翻页检索可以是一种直接从电子设备的消息页面中检索语音消息的检索方式,该种检索方式针对的对象属于用户与联系人,或者联系人在会话群组中的历史会话记录。在一种示例中,历史会话记录可以是指历史聊天记录,即该聊天记录可以是用户与目标联系人之间的聊天记录,也可以是包括目标联系人的会话群组中的聊天记录即群聊记录。在另一种示例中,电子设备也可以通过应用程序提供的历史会话记录入口进入历史会话记录检索页面,并采用相应的检索方式来实现上述目的。其中,搜索检索即是一种可以从应用程序提供的历史会话记录入口进入历史会话记录检索页面,并在历史会话记录检索页面中直接搜索目标联系人的语音消息的检索方式。In this embodiment, tag-based retrieval and page-turning retrieval can be methods for directly retrieving voice messages from the message page of an electronic device. These retrieval methods target historical conversation records between the user and their contacts, or contacts within a conversation group. In one example, historical conversation records can refer to historical chat logs, i.e., chat logs between the user and the target contact, or chat logs within a conversation group including the target contact, i.e., group chat logs. In another example, the electronic device can also access the historical conversation record retrieval page through an entry point provided by the application and use appropriate retrieval methods to achieve the above objectives. Specifically, search retrieval is a method that allows access to the historical conversation record retrieval page through an entry point provided by the application, and direct searching for voice messages from the target contact within the historical conversation record retrieval page.

如图13所示,是本申请实施例提供的一种搜索检索的示意图,采用搜索检索的方式检索语音消息可以是在第二页面中进行的,第二页面可以是历史会话记录检索页面。图13示出了电子设备采用搜索检索的方式从历史会话记录中直接搜索目标联系人的语音消息的过程。Figure 13 illustrates a search and retrieval method provided in this embodiment of the application. The retrieval of voice messages using this method can be performed on a second page, which can be a historical conversation record retrieval page. Figure 13 shows the process by which an electronic device directly searches for voice messages of a target contact from historical conversation records using a search and retrieval method.

与翻页检索类似,当电子设备中并不存在目标联系人的声音特征时,电子设备可以打开用户与目标联系人关联的消息页面并在通过该消息页面上提供的历史会话记录入口进入历史会话记录检索页面,并在历史会话记录检索页面中采用搜索检索的方式检索目标联系人的语音消息。如图13所示,当电子设备打开用户与目标联系人的消息页面后,电子设备可以继续打开历史会话记录,例如该历史会话记录可以是历史聊天记录会话框中存储的会话记录。电子设备可以在历史会话记录中进行搜索,判断是否存在语音聊天记录。如果通过在该历史会话记录检索页面中的搜索,能够搜索到目标联系人的语音聊天记录,电子设备可以对语音聊天记录进行筛选,直到搜索出时长满足相关要求的语音消息,例如时长大于10秒的语音消息,用于后续的声音特征提取。Similar to page-based search, when the target contact's voice characteristics are not present on the electronic device, the device can open the message page associated with the user and the target contact, and access the historical conversation record retrieval page through the historical conversation record entry provided on that message page. The device can then use a search function on the historical conversation record retrieval page to search for the target contact's voice messages. As shown in Figure 13, after opening the message page between the user and the target contact, the electronic device can continue to access historical conversation records, such as those stored in the historical chat history conversation box. The electronic device can search within these historical conversation records to determine if voice chat records exist. If the search on the historical conversation record retrieval page yields voice chat records for the target contact, the electronic device can filter these records until voice messages that meet the relevant duration requirements are found, such as voice messages longer than 10 seconds, for subsequent voice feature extraction.

与翻页检索过程中在会话群组中进行检索的过程类似,当用户与目标联系人的历史会话记录中并不存在满足相关要求的语音消息时,电子设备可以在存在目标联系人的会话群组的历史会话记录中进行检索,上述会话群组的历史会话记录可以是群聊历史记录。如图13所示,电子设备可以打开存在目标联系人的群组的消息页面,接着打开该群组的历史会话记录检索页面,例如该群组的历史聊天记录检索页面,并在其中对群组的历史聊天记录进行搜索。如果在群组的历史聊天记录中搜索到语音消息,电子设备可以通过联系人信息筛选出属于目标联系人的语音消息,并将时长满足相关要求的语音消息作为后续特征提取所要使用的消息。如果会话群组的历史聊天记录中也未搜索到属于目标联系人的语音消息,则本次搜索检索失败。Similar to the process of searching within a conversation group during page-turning retrieval, when no voice messages meeting the relevant requirements exist in the user's historical conversation records with the target contact, the electronic device can search within the historical conversation records of the conversation group containing the target contact. These historical conversation records can be group chat history. As shown in Figure 13, the electronic device can open the message page of the group containing the target contact, then open the historical conversation record search page for that group, such as the group's historical chat history search page, and search within it. If a voice message is found in the group's historical chat history, the electronic device can filter out voice messages belonging to the target contact using contact information, and use voice messages with durations meeting the relevant requirements as the messages to be used for subsequent feature extraction. If no voice message belonging to the target contact is found in the historical chat history of the conversation group, the search fails.

在本申请实施例的一种可能的实现方式中,对目标联系人的语音消息进行检索还可以采用直接检索的方式进行。上述直接检索可以是基于目标联系人的联系人信息,例如目标联系人的ID、账户信息、电话号码等信息,直接在电子设备本地或云端进行检索,以此来获得目标联系人的语音信息。In one possible implementation of this application, the retrieval of voice messages from a target contact can also be performed using a direct retrieval method. This direct retrieval can be based on the target contact's contact information, such as their ID, account information, and phone number, and can be performed directly on the electronic device's local storage or in the cloud to obtain the target contact's voice information.

在本申请实施例的一种可能的实现方式中,上述打标检索、翻页检索、搜索检索以及直接检索可以择一进行,也可以同时进行。示例性地,电子设备可以采用其中一种检索方式对目标联系人的语音消息进行检索,当采用该种检索方式能够检索得到目标联系人的语音消息时,电子设备可以停止检索;当电子设备采用某种检索方式无法检索得到目标联系人的语音消息时,电子设备可以采用另一种检索方式继续进行检索,直到获得目标联系人的语音消息。在一种示例中,电子设备也可以同时采用其中的多种检索方式对目标联系人的语音消息进行检索。当采用至少两种检索方式进行检索获得目标联系人的语音消息时,电子设备可以采用任意一种检索得到的语音消息,作为后续声音特征提取所要使用的语音消息。In one possible implementation of this application embodiment, the above-mentioned tagging retrieval, page-turning retrieval, search retrieval, and direct retrieval can be performed selectively or simultaneously. For example, the electronic device can use one of the retrieval methods to retrieve the voice message of the target contact. When the voice message of the target contact can be retrieved using that retrieval method, the electronic device can stop the retrieval. When the electronic device cannot retrieve the voice message of the target contact using a certain retrieval method, the electronic device can continue the retrieval using another retrieval method until the voice message of the target contact is obtained. In one example, the electronic device can also use multiple retrieval methods simultaneously to retrieve the voice message of the target contact. When the voice message of the target contact is obtained using at least two retrieval methods, the electronic device can use any one of the retrieved voice messages as the voice message to be used for subsequent voice feature extraction.

如图10所示,电子设备在检索得到目标联系人的语音消息并且该语音消息也符合相关时长要求时,电子设备可以从中进行语音音频数据流抓取。上述语音音频数据流抓取的过程可以采用自动化抓取的方式,在模拟播放检索到的语音消息的过程中,从电子设备后台抓取到相应的语音数据流。As shown in Figure 10, when an electronic device retrieves a voice message from a target contact and the message meets the required duration, it can capture the voice audio data stream. This voice audio data stream capture process can be automated, capturing the corresponding voice data stream from the electronic device's backend while simulating playback of the retrieved voice message.

如图14所示,是本申请实施例提供的一种语音音频数据流抓取的示意图,图14示出了通过模拟播放语音消息,从电子设备后台抓取语音音频数据流的过程。Figure 14 is a schematic diagram of a voice audio data stream capture provided in an embodiment of this application. Figure 14 shows the process of capturing voice audio data streams from the background of an electronic device by simulating the playback of voice messages.

在电子设备检索得到目标联系人的语音消息后,电子设备可以模拟播放该条语音消息,从而在语音消息模拟播放的过程中抓取得到相关的语音音频数据流。在此过程中,如图14,电子设备首先可以控制扬声器或虚拟扬声器静音。通过控制扬声器或虚拟扬声器静音,避免了语音音频数据流抓取过程对用户的打扰。当扬声器或虚拟扬声器静音后,电子设备可以模拟点击播放检索到的目标联系人的语音消息,并在播放的过程中抓取语音音频数据流。当语音消息播放结束后,电子设备抓取语音音频数据流的过程也随之结束。此时,电子设备可以恢复扬声器功能。抓取得到的语音音频数据流可以在进行质量检测合格后,用于声音特征的提取。After the electronic device retrieves the voice message of the target contact, it can simulate playing the message, thereby capturing the relevant audio data stream during the simulated playback. As shown in Figure 14, the electronic device can first mute the speaker or virtual speaker. Mute the speaker or virtual speaker to avoid disturbing the user during the audio data stream capture process. Once the speaker or virtual speaker is muted, the electronic device can simulate clicking to play the retrieved voice message from the target contact, capturing the audio data stream during playback. When the voice message finishes playing, the audio data stream capture process ends. At this point, the electronic device can restore its speaker function. The captured audio data stream, after passing quality checks, can be used for sound feature extraction.

在本申请实施例的一种可能的实现方式中,用于抓取语音音频数据流的目标联系人的语音消息还可以是电子设备从电子设备本地或云端数据库中检索得到的。其中,云端数据库中存储的目标联系人的语音消息可以是目标联系人预先录制并上传至云端数据库,或者在用户与目标联系人通话过程中在目标联系人授权的情况下,通过录制通话过程中的语音消息得到的。用户与目标联系人通话过程中录制得到的语音消息也可以存储于电子设备本地。In one possible implementation of this application, the voice message of the target contact used to capture the voice audio data stream can also be retrieved by the electronic device from a local or cloud database. The voice message of the target contact stored in the cloud database can be pre-recorded by the target contact and uploaded to the cloud database, or obtained by recording voice messages during a call between the user and the target contact with the target contact's authorization. Voice messages recorded during a call between the user and the target contact can also be stored locally on the electronic device.

如图15所示,是本申请实施例提供的另一种语音音频数据流抓取的示意图,图15示出了从云端获取目标联系人的语音音频数据流的过程。Figure 15 is a schematic diagram of another voice audio data stream capture provided in an embodiment of this application. Figure 15 shows the process of obtaining the voice audio data stream of the target contact from the cloud.

在一种示例中,如图15所示,目标联系人可以使用自己持有的电子设备,例如使用自己的手机采录一段音频并将该音频与自己的账户信息绑定。其中,账户信息可以是唯一标识目标联系人的信息,如联系人信息。上述绑定有联系人信息的音频数据可以被目标联系人上传至云端数据库,用于开放给其他用户用作通信过程中声音的合成或克隆。In one example, as shown in Figure 15, the target contact can use their own electronic device, such as their mobile phone, to record an audio clip and bind that audio to their account information. The account information can be uniquely identifying the target contact, such as contact details. The audio data bound to the contact information can then be uploaded to a cloud database by the target contact, making it available to other users for use in voice synthesis or cloning during communication.

在另一种示例中,如图15所示,在用户与目标联系人通话的过程中,电子设备可以判断是否已存储该目标联系人的声音特征,如果电子设备中并未存储该目标联系人的声音特征,电子设备可以在目标联系人授权同意的情况下,截取一定时长的通话录音。例如,电子设备可以在通话过程中截取10秒长的录音,并将截取得到的音频数据与该目标联系人的账户信息相绑定,从而上传至云端数据库或直接存储于电子设备本地,用于后续通信过程中声音的合成或克隆。In another example, as shown in Figure 15, during a call between a user and a target contact, the electronic device can determine whether the target contact's voice characteristics have been stored. If the electronic device does not store the target contact's voice characteristics, it can, with the target contact's authorization and consent, extract a certain duration of call recording. For example, the electronic device can extract a 10-second recording during the call and bind the extracted audio data with the target contact's account information, thereby uploading it to a cloud database or storing it directly on the electronic device for subsequent voice synthesis or cloning during communication.

电子设备在需要抓取目标联系人的语音音频数据流时,可以在电子设备本地,或者从上述被上传至云端数据库中的音频数据中进行检索,检索得到的音频数据即可作为电子设备抓取得到的目标联系人的语音音频数据流。When an electronic device needs to capture the voice audio data stream of a target contact, it can search locally on the electronic device or from the audio data uploaded to the cloud database. The retrieved audio data can then be used as the voice audio data stream of the target contact captured by the electronic device.

在本申请实施例的另一种可能的实现方式中,存储于电子设备本地或云端数据库中的音频数据还可以是通过录屏的方式获得的。用户可以通过录屏采集得到目标联系人的音频数据并从中提取得到目标联系人的声音特征,在获得目标联系人授权使用其声音的情况下,电子设备可以根据目标联系人的声音特征合成得到与目标联系人的真实声音相同或相似的目标声音。In another possible implementation of this application, the audio data stored in the local database of the electronic device or in the cloud can also be obtained through screen recording. The user can capture the audio data of the target contact through screen recording and extract the target contact's voice characteristics. With the target contact's authorization to use their voice, the electronic device can synthesize a target voice that is the same as or similar to the target contact's real voice based on the target contact's voice characteristics.

在一种示例中,当用户使用电子设备与联系人进行视频通话时,用户可以在获得联系人授权的情况下,在电子设备上录屏,得到录屏文件,该录屏文件可以是一种视频格式的文件,其中包含有联系人的音频数据。电子设备可以从音频数据中提取得到联系人的声音特征,并将声音特征与对应的联系人绑定,存储于电子设备本地,或将音频数据与联系人绑定,存储至云端数据库。上述授权过程可以包括允许用户对视频通话过程进行录屏的授权,以及从录屏文件中提取出联系人的声音特征,用于实现本申请实施例提供的相关方案的授权。In one example, when a user makes a video call with a contact using an electronic device, the user, with the contact's authorization, can record their screen on the electronic device, resulting in a screen recording file. This screen recording file can be a video file containing the contact's audio data. The electronic device can extract the contact's voice features from the audio data and bind the voice features to the corresponding contact, storing them locally on the electronic device, or bind the audio data to the contact and store it in a cloud database. The authorization process described above can include authorizing the user to record the video call and extracting the contact's voice features from the screen recording file, to implement the relevant solutions provided in the embodiments of this application.

在另一种示例中,用户可以在电子设备上进行操作,自行查找由联系人发送的语音消息。在用户点击播放该语音消息的过程中,用户可以执行录屏操作得到录屏文件。这样,录屏文件中包含有联系人的音频数据。电子设备可以在获得联系人授权的情况下,执行本申请实施例提供的相关方案的步骤,从音频数据中提取得到联系人的声音特征,并将声音特征与对应的联系人绑定,存储于电子设备本地,或者将音频数据与联系人绑定,存储至云端数据库。上述两种示例中存储于云端数据库的音频文件,可以在后续实现本申请实施例提供的相关方案的过程中,供电子设备从中提取出联系人的声音特征,用于合成得到与对应联系人的真实声音相同或相似的目标声音。In another example, a user can operate an electronic device to find voice messages sent by a contact. While the user is playing the voice message, they can record a screen to obtain a recording file. This recording file contains the contact's audio data. With the contact's authorization, the electronic device can execute the steps of the relevant solutions provided in this application embodiment, extracting the contact's voice features from the audio data and binding the voice features to the corresponding contact, storing them locally on the electronic device, or binding the audio data to the contact and storing it in a cloud database. The audio files stored in the cloud database in both examples can be used by the electronic device to extract the contact's voice features during subsequent implementation of the relevant solutions provided in this application embodiment, for synthesizing a target voice that is the same as or similar to the corresponding contact's real voice.

在本申请实施例的一种可能的实现方式中,如图10所示,为了保证后续合成或克隆得到的目标声音与目标联系人真实的声音数据尽可能相同或相似,电子设备在抓取得到目标联系人的语音音频数据流后,可以进行音频质量检测,判断抓取到的语音音频数据流是否满足质量要求。对于满足质量要求的音频数据,电子设备可以继续执行声音特征提取的步骤;否则,电子设备可以重新检索目标联系人的语音消息。In one possible implementation of this application embodiment, as shown in FIG10, in order to ensure that the target voice subsequently synthesized or cloned is as similar as possible to the actual voice data of the target contact, the electronic device can perform audio quality detection after capturing the voice audio data stream of the target contact to determine whether the captured voice audio data stream meets the quality requirements. For audio data that meets the quality requirements, the electronic device can continue to perform the voice feature extraction step; otherwise, the electronic device can re-retrieve the voice message of the target contact.

在本申请实施例中,对音频数据的质量检测可以包括对抓取得到的语音音频数据流本身的音频质量的检测,以及当语音音频数据流中存在多个联系人的声音时,对目标人声的识别筛选的过程。上述目标人声即是目标联系人的声音。In this embodiment, the quality detection of audio data may include detecting the audio quality of the captured voice audio data stream itself, and identifying and filtering the target voice when multiple contacts' voices exist in the voice audio data stream. The aforementioned target voice is the voice of the target contact.

如图16所示,是本申请实施例提供的一种目标人声识别筛选的示意图,图16中示出了对于抓取得到的语音音频数据流进行质量检测以及目标人声的识别筛选的全过程。Figure 16 is a schematic diagram of a target human voice recognition and screening method provided in an embodiment of this application. Figure 16 shows the entire process of quality detection and target human voice recognition and screening of the captured voice audio data stream.

当电子设备开始对语音音频数据流进行质量检测时,如图16所示,电子设备可以首先判断当前待检测的语音音频数据流的信噪比是否满足后续声音特征提取的要求,即判断语音音频数据流是否达标。如果语音音频数据流并不满足要求,则电子设备可以对语音音频数据流进行降噪处理,直至信噪比达标。当语音音频数据流信噪比达标后,电子设备可以判断语音音频数据流中是否存在人声。如果经过多次降噪处理,语音音频数据流的信噪比均无法达标;或者达标后的语音音频数据流中并不存在人声,则电子设备可以根据联系人信息,重新检索目标联系人的语音消息。When the electronic device begins quality inspection of the voice audio data stream, as shown in Figure 16, it first determines whether the signal-to-noise ratio (SNR) of the current voice audio data stream meets the requirements for subsequent sound feature extraction, i.e., whether the voice audio data stream meets the standard. If the voice audio data stream does not meet the requirements, the electronic device can perform noise reduction processing on the voice audio data stream until the SNR meets the standard. Once the SNR of the voice audio data stream meets the standard, the electronic device can determine whether there is human voice in the voice audio data stream. If, after multiple noise reduction processes, the SNR of the voice audio data stream still fails to meet the standard, or if there is no human voice in the voice audio data stream after it meets the standard, the electronic device can re-retrieve the voice messages of the target contact based on the contact information.

如果达标后的语音音频数据流中存在人声,则为了识别出目标联系人的声音,电子设备首先可以判断语音音频数据流中是否存在不同的人声,即判断语音音频数据流中的人声仅属于目标联系人或是包含目标联系人在内的多个联系人。如果语音音频数据流中的人声属于多个联系人,如图16所示,电子设备可以对不同人声进行分离与拼接,并基于频谱、声强、持续时间等信息,评估得到属于目标联系人的声音即目标人声。在经过人声分离与拼接后,电子设备可以判断属于目标人声的语音音频数据流是否超过一定时长,例如是否超过5秒时长。如果目标人声超过上述要求的5秒时长,则电子设备可以将其用于声音特征提取;否则,如果目标人声短于要求的5秒时长,进行特征提取可能无法获得有效的声音特征,此时电子设备可以根据联系人信息重新检索语音消息。If the qualified voice audio data stream contains human voices, in order to identify the target contact's voice, the electronic device can first determine whether there are different human voices in the voice audio data stream, that is, whether the human voice in the voice audio data stream belongs only to the target contact or includes multiple contacts. If the human voice in the voice audio data stream belongs to multiple contacts, as shown in Figure 16, the electronic device can separate and splice the different human voices, and evaluate the voice belonging to the target contact, i.e., the target human voice, based on information such as spectrum, sound intensity, and duration. After human voice separation and splicing, the electronic device can determine whether the voice audio data stream belonging to the target human voice exceeds a certain duration, such as whether it exceeds 5 seconds. If the target human voice exceeds the above-mentioned requirement of 5 seconds, the electronic device can use it for sound feature extraction; otherwise, if the target human voice is shorter than the required 5 seconds, feature extraction may not obtain effective sound features. In this case, the electronic device can re-retrieve the voice message based on the contact information.

在本申请实施例中,如图10所示,对于经过质量检测的语音音频数据流,电子设备可以从中提取声音特征并存储。被存储的声音特征可以在后续经合成或克隆处理,得到与目标联系人的真实声音相同或相似的目标声音,从而使得电子设备可以采用目标声音,按照与目标联系人相同相似或相近的声音对目标联系人发送的信息进行语音播报。In this embodiment of the application, as shown in Figure 10, for a voice audio data stream that has undergone quality inspection, the electronic device can extract and store sound features. The stored sound features can be subsequently synthesized or cloned to obtain a target sound that is the same as or similar to the real voice of the target contact, thereby enabling the electronic device to use the target sound to broadcast the information sent by the target contact in a voice manner that is the same as or similar to the target contact.

如图17所示,是本申请实施例提供的一种声音特征提取与存储的示意图。对于经过质量检测的语音音频数据流,电子设备可以从中提取得到目标联系人的声音特征,该声音特征可以包括音频特征和文本特征。其中,音频特征可以包括HuBERT(hidden-unit BERT)特征,频谱特征等;文本特征可以包括BERT(bidirectional encoder representations from transformers)特征等。上述文本特征可以是采用自动语音识别(automatic speech recognition,ASR)技术对质量合格的语音音频数据流进行处理,在获得其中包含的文本数据的基础上进行特征提取得到的。电子设备可以基于目标联系人的联系人信息,对上述音频特征和文本特征进行存储,用于后续的目标声音的克隆或合成。基于联系人信息对提取得到的声音特征进行存储,也有助于在后续检索相关特征时能够快速地根据联系人信息找到所需的目标联系人的声音特征,加快目标声音克隆或合成的速度。Figure 17 illustrates a schematic diagram of sound feature extraction and storage provided in an embodiment of this application. For a quality-checked voice audio data stream, the electronic device can extract the sound features of the target contact person. These sound features can include audio features and text features. Audio features can include HuBERT (hidden-unit BERT) features, spectral features, etc.; text features can include BERT (bidirectional encoder representations from transformers) features, etc. The aforementioned text features can be obtained by processing the qualified voice audio data stream using automatic speech recognition (ASR) technology, and extracting features based on the text data contained within it. The electronic device can store the aforementioned audio and text features based on the target contact person's contact information for subsequent cloning or synthesis of the target sound. Storing the extracted sound features based on contact information also helps to quickly find the required target contact person's sound features based on the contact information when retrieving relevant features later, thus accelerating the speed of target sound cloning or synthesis.

为了便于理解,下面结合详细流程,以具体的示例,对本申请实施例提供的信息处理方法进行介绍。如图18和图19所示,是本申请实施例提供的两种不同的处理方式的示意图。其中,图18是以社交应用为例,对电子设备中社交类应用程序接收到的信息进行处理并语音播报的示意图。图19是以畅联消息/短消息为例,对电子设备接收到的信息进行处理并语音播报的示意图。To facilitate understanding, the information processing method provided in this application embodiment will be described below with detailed process and specific examples. Figures 18 and 19 are schematic diagrams of two different processing methods provided in this application embodiment. Figure 18 is a schematic diagram of processing and voice-reading information received by a social application on an electronic device, using a social application as an example. Figure 19 is a schematic diagram of processing and voice-reading information received by an electronic device, using a messaging app/SMS as an example.

如图18所示,电子设备可以是用户使用的手机,手机上安装有社交类应用程序。当应用程序接收到联系人发送的消息时,电子设备可以通过执行本申请实施例提供的方法的各个步骤,对信息进行处理,并生成相应的语音信息,然后使用通过声音合成或克隆得到的与目标联系人声音相同或相似的声音,播报上述语音信息。通过声音合成或克隆,使用与联系人真实声音相同或相似的声音播报对应联系人发送的信息,可以在用户没有查看该信息来源的情况下,快速地确定发送信息的联系人,从而在用户听觉层面建立播报的语音信息与联系人之间的关联。下面,对上述过程具体进行说明。As shown in Figure 18, the electronic device can be a mobile phone used by the user, with a social networking application installed on it. When the application receives a message sent by a contact, the electronic device can process the information and generate corresponding voice information by executing the steps of the method provided in this application embodiment. Then, it uses a voice that is the same as or similar to the target contact's voice, obtained through voice synthesis or cloning, to broadcast the aforementioned voice information. By using a voice that is the same as or similar to the contact's real voice to broadcast the information sent by the corresponding contact through voice synthesis or cloning, the contact who sent the information can be quickly identified without the user viewing the source of the information, thereby establishing an association between the broadcast voice information and the contact at the user's auditory level. The above process will be described in detail below.

如图18所示,当应用程序接收到目标联系人发送的消息时,可以首先确定系统中是否存储有目标联系人的声音特征,上述过程可以通过本申请前述相关步骤来实现,包括通过对系统中的联系人ID、备注名、昵称、头像等能够标记和识别联系人的要素进行判断、对比和/或标记等处理,确定和判断系统中预存的声音特征是否是属于发送当前信息的目标联系人的声音特征。如果系统中存储有目标联系人的声音特征,则手机可以直接使用目标联系人的声音特征合成与目标联系人的真实声音相同或相似的目标声音,采用目标声音对接收到的信息进行语音播报。如果手机中没有目标联系人的声音特征,则手机需要通过前述各个实施例中介绍的步骤,在获得授权的基础上,检索获得目标联系人的语音消息,并通过特征提取等处理,获得目标联系人的声音特征,然后采用提取出的声音特征生成与该目标联系人的真实声音相同或相似的目标声音,并使用该目标声音播报语音信息。上述授权包括获得用户的授权以及目标联系人的授权,授权的过程可以参见本申请前述各个步骤的相关介绍。As shown in Figure 18, when an application receives a message from a target contact, it can first determine whether the system stores the target contact's voice features. This process can be implemented through the aforementioned steps of this application, including judging, comparing, and/or marking elements in the system that can identify and mark contacts, such as contact ID, nickname, avatar, etc., to determine whether the pre-stored voice features in the system belong to the target contact who sent the current message. If the system stores the target contact's voice features, the phone can directly use the target contact's voice features to synthesize a target voice that is the same as or similar to the target contact's real voice, and use the target voice to broadcast the received information. If the phone does not have the target contact's voice features, the phone needs to retrieve the target contact's voice message based on authorization, and obtain the target contact's voice features through feature extraction and other processing, then use the extracted voice features to generate a target voice that is the same as or similar to the target contact's real voice, and use the target voice to broadcast the voice information. The aforementioned authorization includes obtaining authorization from the user and authorization from the target contact. The authorization process can be found in the relevant descriptions of the aforementioned steps of this application.

在本申请实施例的一种可能的实现方式中,电子设备在提取出联系人对应的声音特征后,可以将该声音特征与系统中各个平台中的同一联系人绑定。示例性地,电子设备根据在社交软件中检索到的语音消息,并从中提取出声音特征后,可以将该声音特征与电子设备中其他平台或应用中属于同一联系人的各个联系人进行绑定。例如,确定社交类应用程序中的联系人“张三”、通讯录应用中的联系人“张JACK”、短信应用中的联系人“张JACK”、工作应用软件中的联系人“研发组张三”属于同一联系人,电子设备在基于某一应用程序中检索到的语音消息并提取出该联系人的声音特征后,可以将声音特征与上述各个平台或应用中的同一联系人建立绑定关系,从而在后续各个平台或应用接收到该联系人发送的信息时,均可以通过提取该声音特征用于合成与目标联系人的真实声音相同或相似的目标声音,进而采用目标声音播报语音信息。In one possible implementation of this application, after extracting the voice features corresponding to a contact, the electronic device can bind the voice features to the same contact on various platforms in the system. For example, based on voice messages retrieved in social media software and extracting their voice features, the electronic device can bind these voice features to contacts belonging to the same contact on other platforms or applications within the electronic device. For instance, if it is determined that the contact "Zhang San" in a social media application, "Zhang JACK" in a contact list application, "Zhang JACK" in a text messaging application, and "Zhang San from the R&D team" in a work application belong to the same contact, after the electronic device extracts the voice features of that contact based on voice messages retrieved in a certain application, it can establish a binding relationship between the voice features and the same contact on the aforementioned platforms or applications. Therefore, when subsequent platforms or applications receive information sent by that contact, they can use the extracted voice features to synthesize a target voice that is the same as or similar to the target contact's real voice, and then use the target voice to broadcast the voice information.

在检索联系人的语音消息时,可以使用前述介绍的打标检索、翻页检索、搜索检索或直接检索中的任意一种或多种检索方式。示例性地,手机可以从用户与该联系人的历史聊天记录中检索到联系人的语音消息,并建立起该联系人与检索到的语音消息之间的对应关系。When retrieving a contact's voice messages, one or more of the aforementioned tag-based search, page-turning search, search search, or direct search methods can be used. For example, a mobile phone can retrieve a contact's voice messages from the user's historical chat history with that contact and establish a correspondence between the contact and the retrieved voice messages.

如图18所示,在进行联系人的语音消息检索时,手机可以模拟打开应用程序上用户与该联系人的聊天对话框,并逐一检索二人的历史聊天记录。为了提高后续声音合成或克隆的准确性,检索到的语音消息应当满足一定的时长要求。例如,可以从聊天记录中检索一条时长大于10秒的由该联系人发送的语音消息。As shown in Figure 18, when retrieving a contact's voice message, the mobile phone can simulate opening the chat dialog box between the user and that contact in the application and search through their historical chat history one by one. To improve the accuracy of subsequent voice synthesis or cloning, the retrieved voice messages should meet certain duration requirements. For example, a voice message sent by that contact that is longer than 10 seconds can be retrieved from the chat history.

在本申请实施例的一种可能的实现方式中,手机可以按照接收到消息的时间,倒序检索历史聊天记录。当检索到消息时长不满足时长要求的历史语音消息时,手机可以不对该条消息进行处理。例如,检索到时长为5秒的语音消息时,手机可以不处理该条消息,而继续进行检索,直到检索到时长大于10秒的语音消息。In one possible implementation of this application, the mobile phone can retrieve historical chat records in reverse chronological order based on the time the message was received. When a historical voice message whose duration does not meet the required duration is retrieved, the mobile phone may not process that message. For example, if a voice message with a duration of 5 seconds is retrieved, the mobile phone may not process that message and may continue searching until a voice message with a duration of more than 10 seconds is retrieved.

另一方面,手机可以对检索到的语音消息进行特征提取,抓取出的特征可以用于后续的声音合成或克隆处理。On the other hand, mobile phones can extract features from retrieved voice messages, and the extracted features can be used for subsequent voice synthesis or cloning processing.

在本申请实施例的一种可能的实现方式中,手机在提取语音消息中的特征时,可以通过模拟播放该语音消息的方式来实现。具体地,手机可以模拟点击该条语音消息,在后台播放该条语音消息的过程中,抓取语音音频数据流,用于后续的特征提取以及声音合成或克隆。这样,整个过程对用户是无感的,不会影响用户在手机上的其他操作,也不会真实地播放检索到的该语音消息。In one possible implementation of this application, the mobile phone can extract features from a voice message by simulating the playback of the voice message. Specifically, the mobile phone can simulate clicking on the voice message and capture the audio data stream while the voice message is playing in the background for subsequent feature extraction and sound synthesis or cloning. In this way, the entire process is seamless for the user, does not affect other operations on the phone, and does not actually play the retrieved voice message.

对于提取出的相关声音特征,手机可以将其与联系人信息,例如联系人的ID进行绑定,并存储,以便后续在接收到该联系人发送的信息时,使用相关声音特征合成与该联系人真实声音相同或相似的目标声音,用于播报对信息进行转换得到的语音信息。The phone can bind and store the extracted sound features with contact information, such as the contact's ID, so that when it receives a message from that contact, it can use the sound features to synthesize a target sound that is the same as or similar to the contact's real voice, and then broadcast the voice information obtained by converting the message.

如图19所示,是本申请实施例提供的另一种播报方式的示意图。在图19所示的示例中,可以采用目标联系人的声音特征合成目标声音,目标联系人的声音特征可以是从云端数据库中存储的该目标联系人的参考音频中提取得到的。如图19所示,联系人Tom可以通过手机自行采录一段个人的音频数据作为参考音频,参考音频可以被上传至云端数据库并与该联系人Tom的账户信息绑定,用于后续的特征提取与声音合成。另一方面,参考音频也可以来自于用户与联系人的通话过程。示例性地,在用户与联系人Tom通话的过程中,可以在获得联系人Tom授权的情况下,采集通话过程中的一段音频作为参考音频,上述参考音频可以被上传至云端数据库并与该联系人Tom的账户信息绑定,用于后续的特征提取与声音合成。通话过程中采录的参考音频也可以存储于电子设备本地,后续在需要使用时,电子设备可以直接从本地获取参考音频。Figure 19 illustrates another broadcasting method provided in this application embodiment. In the example shown in Figure 19, the target voice can be synthesized using the voice features of the target contact. The voice features of the target contact can be extracted from reference audio stored in a cloud database. As shown in Figure 19, contact Tom can record a segment of his own audio data using his mobile phone as reference audio. The reference audio can be uploaded to the cloud database and bound to contact Tom's account information for subsequent feature extraction and voice synthesis. Alternatively, the reference audio can also come from a call between the user and the contact. For example, during a call between the user and contact Tom, with contact Tom's authorization, a segment of audio from the call can be recorded as reference audio. This reference audio can be uploaded to the cloud database and bound to contact Tom's account information for subsequent feature extraction and voice synthesis. The reference audio recorded during the call can also be stored locally on the electronic device, and the electronic device can directly retrieve the reference audio from the local storage when needed.

如图19所示,当系统中并不存在联系人Tom的声音特征时,电子设备可以在云端数据库中检索与该联系人Tom的账户信息绑定的参考音频文件。对检索到的参考音频文件,电子设备可以按照前述各个实施例中介绍的方式对其进行质量检测与处理。处理后的音频数据可以用于进行声音特征提取。提取出的声音特征可以与联系人Tom的账户信息进行绑定并存储。当后续需要使用与联系人Tom的声音相同或相似的目标声音播报其发送的信息时,电子设备可以使用上述已提取并存储的声音特征进行声音合成或克隆,得到目标声音,用于播报联系人Tom发送的信息。As shown in Figure 19, when the voice characteristics of contact Tom do not exist in the system, the electronic device can retrieve a reference audio file bound to the account information of contact Tom from the cloud database. The electronic device can perform quality checks and processing on the retrieved reference audio file according to the methods described in the preceding embodiments. The processed audio data can be used for voice feature extraction. The extracted voice features can be bound to and stored with the account information of contact Tom. When it is subsequently necessary to use a target voice that is the same as or similar to contact Tom's voice to broadcast the information sent by him, the electronic device can use the extracted and stored voice features to synthesize or clone the voice to obtain the target voice for broadcasting the information sent by contact Tom.

在本申请实施例中,电子设备确定发送信息的目标联系人并通过声音克隆或合成等处理获得与该目标联系人的真实声音相同或相似的目标声音后,可以采用目标声音,对接收到的信息进行语音播报。这样,用户可以通过听觉直观地建立与目标联系人的熟悉感。In this embodiment, after the electronic device identifies the target contact for sending the information and obtains a target voice that is the same as or similar to the real voice of the target contact through voice cloning or synthesis, it can use the target voice to broadcast the received information. In this way, the user can intuitively establish a sense of familiarity with the target contact through hearing.

返回参见图1,在获取目标联系人的声音特征并获得该目标联系人允许使用其声音特征的授权后,电子设备可以按照一定的播报控制策略对接收到的信息进行处理,得到待播报的目标信息。电子设备生成播报控制策略的过程可以综合考虑前一条信息与本条信息的来源、用户与目标联系人的熟悉程度、接收到的信息的类型和复杂度等因素。也即,电子设备可以基于上述一种或多种因素对接收到的信息进行处理,生成目标信息。电子设备可以采用目标声音对目标信息进行语音播报。下面,结合相应的示例,对电子设备按照播报控制策略对接收到的信息进行处理的过程进行详细介绍。Referring back to Figure 1, after acquiring the voice characteristics of the target contact and obtaining authorization from that contact to use those characteristics, the electronic device can process the received information according to a certain broadcast control strategy to obtain the target information to be broadcast. The process by which the electronic device generates the broadcast control strategy can comprehensively consider factors such as the source of the previous and current information, the familiarity between the user and the target contact, and the type and complexity of the received information. That is, the electronic device can process the received information based on one or more of these factors to generate the target information. The electronic device can then use the target voice to broadcast the target information. Below, with corresponding examples, the process by which the electronic device processes the received information according to the broadcast control strategy will be described in detail.

如图20所示,是本申请实施例提供的一种生成播报控制策略的示意图,图20所示的过程也即是电子设备对接收到的信息进行处理,生成目标信息的过程。在该过程中,电子设备可以针对即时信息和非即时信息分别进行处理。其中,电子设备对即时信息的处理过程可以包括信息来源判断、联系人来源判断、即时信息合并处理、非文本信息处理,以及信息精简判断等步骤。Figure 20 is a schematic diagram of a broadcast generation control strategy provided in an embodiment of this application. The process shown in Figure 20 is the process by which an electronic device processes received information to generate target information. In this process, the electronic device can process real-time information and non-real-time information separately. The processing of real-time information by the electronic device may include steps such as information source determination, contact source determination, real-time information merging processing, non-text information processing, and information simplification determination.

在信息来源判断步骤中,电子设备可以根据相邻信息间的差异性判断下一条信息的来源播报方式。其中,相邻信息可以是在接收信息的时间上具有前后顺序的信息。示例性地,电子设备接收到某一联系人发送的一条信息后,再次接收到另一条信息,这两条信息即是相邻信息。相邻信息间的差异性可以包括接收到相邻信息的间隔时间、所属平台、所属会话群组、所属联系人等。In the information source determination step, the electronic device can determine the source and broadcast method of the next message based on the differences between adjacent messages. Adjacent messages can be those that are received sequentially in time. For example, if the electronic device receives a message from a contact and then receives another message, these two messages are adjacent messages. The differences between adjacent messages can include the interval between their receipt, the platform they belong to, the chat group they belong to, and the contact they belong to.

如图21所示,是本申请实施例提供一种信息来源判断流程示意图。按照图21所示的判断流程,电子设备可以根据相邻信息间的差异性判断下一条信息的来源播报方式,例如是否播报信息来源、是否简化播报信息来源等等。当信息流中出现群组信息、联系人变换、信息来源变换等情况时,通过对信息来源进行判断,可以在避免歧义的前提下,较少地播报信息来源,提高了信息传递的效率。Figure 21 illustrates a flow chart for determining the source of information according to an embodiment of this application. Following the flow chart, the electronic device can determine the source broadcasting method for the next piece of information based on the differences between adjacent information, such as whether to broadcast the information source or simplify the broadcast. When group information, contact changes, or information source changes occur in the information stream, determining the information source can reduce the amount of information broadcast while avoiding ambiguity, thus improving the efficiency of information transmission.

作为本申请实施例的一种示例,信息来源判断过程可以通过相邻信息的时间间隔,是否属于相同的平台,是否来自会话群组以及是否属于同一群组或同一联系人等,来确定播报时是否携带信息来源。As an example of an embodiment of this application, the information source determination process can determine whether the broadcast carries the information source by considering factors such as the time interval between adjacent information, whether they belong to the same platform, whether they come from a conversation group, and whether they belong to the same group or the same contact.

示例性地,电子设备在接收到一条信息,并按照本申请实施例提供的方法进行播报时,可以携带该条信息的来源,即播报信息内容前首先播报该条信息的来源。例如,来自社交应用APP1中工作沟通群中的Tom,或者来自联系人Tom。前一示例中表明此条信息为会话群组信息也即群聊信息,而后一示例则表明该条信息为个人信息也即私聊消息。For example, when an electronic device receives a message and broadcasts it according to the method provided in this application embodiment, it can carry the source of the message, that is, broadcast the source of the message before broadcasting the content of the message. For example, it may be from Tom in a work communication group in the social application APP1, or from a contact named Tom. The former example indicates that the message is a conversation group message, that is, a group chat message, while the latter example indicates that the message is a personal message, that is, a private chat message.

如图21所示,在电子设备接收到下一条信息时,电子设备可以判断最新接收到的信息与上一条信息的差异性。例如,首先判断两条信息的时间间隔是否小于预设间隔,如1分钟。如果时间间隔大于预设间隔,电子设备可以按照新接收到信息的处理方式,播报该条信息的来源。如果两条信息的时间间隔小于预设间隔,电子设备可以判断两条信息是否来自相同的平台,是否来自同一群聊以及是否来自群聊中的同一位联系人等。如果这些判断结果均为相同,即本条信息与上一条信息来自同一平台,且来自同一群聊中的同一联系人,例如均来自上述示例中社交应用APP1中工作沟通群中的Tom,则电子设备在播报最新接收到的消息时,可以在避免歧义的前提下省略信息来源,直接播报信息内容。当上述判断结果存在不一致,例如两条信息虽然都来自联系人Tom,但前一条信息的来源为社交应用APP1中工作沟通群,另一条信息属于私人对话,则为了避免歧义,电子设备应当播报新信息的来源。As shown in Figure 21, when an electronic device receives a new message, it can determine the difference between the latest message and the previous message. For example, it first determines whether the time interval between the two messages is less than a preset interval, such as 1 minute. If the time interval is greater than the preset interval, the electronic device can announce the source of the message according to the processing method for newly received messages. If the time interval is less than the preset interval, the electronic device can determine whether the two messages come from the same platform, the same group chat, or the same contact within the group chat. If these determinations are all the same, i.e., the current message and the previous message come from the same platform and the same contact within the same group chat (e.g., both from Tom in the work communication group of social application APP1 in the example above), the electronic device can omit the source of the message and directly announce the content when announcing the latest message, provided that ambiguity is avoided. When the above determinations are inconsistent, for example, although both messages come from contact Tom, the source of the first message is the work communication group of social application APP1, while the other message is a private conversation, the electronic device should announce the source of the new message to avoid ambiguity.

在本申请实施例的一种可能的实现方式中,电子设备进行信息来源判断的过程可以由信息来源判断模块实现。In one possible implementation of this application embodiment, the process of determining the information source of the electronic device can be implemented by the information source determination module.

在联系人来源判断步骤中,电子设备可以根据信息所属联系人与用户的互动行为特征,推测用户对其熟悉度,决定是否播报联系人的标记名;或者,按照规则对联系人标记名进行简化播报。In the contact source determination step, the electronic device can infer the user's familiarity with the contact based on the interaction characteristics between the contact and the user, and decide whether to broadcast the contact's tag name; or, according to the rules, to broadcast a simplified version of the contact's tag name.

如图22所示,是本申请实施例提供的一种联系人来源判断流程示意图。在该过程中,电子设备可以查询信息所属联系人及其与用户的互动行为特征,推测用户对其熟悉度,进而确认当前发送信息的目标联系人是否为用户熟悉的联系人。如果该目标联系人属于用户熟悉的联系人,则电子设备可以不对该目标联系人的标记名进行播报,例如不播报该联系人的名称;如果该目标联系人不属于用户熟悉的联系人,则电子设备应当播报该目标联系人的标记名,例如播报该联系人的名称。在一种示例中,目标联系人的标记名可能较长,例如联系人名称中包含多个字符,则电子设备可以按照规则对联系人的名称进行简化,尽可能提高信息播报的效率。Figure 22 illustrates a contact source determination process provided in an embodiment of this application. In this process, the electronic device can query the contact to which the information belongs and their interaction characteristics with the user, inferring the user's familiarity with them, and thus confirming whether the target contact of the currently sent information is a contact the user is familiar with. If the target contact is a familiar contact, the electronic device may not announce the target contact's identifier, for example, it may not announce the contact's name; if the target contact is not a familiar contact, the electronic device should announce the target contact's identifier, for example, it should announce the contact's name. In one example, the target contact's identifier may be long, for example, the contact name may contain multiple characters. In this case, the electronic device can simplify the contact's name according to rules to improve the efficiency of information announcement as much as possible.

作为本申请实施例的一种示例,如果联系人名称为“Tom-人机交互项目-0522”,其中“Tom”为该联系人的姓名,“人机交互项目”为该联系人的工作组,0522可以是联系人的工号。则对上述较长的联系人名称进行简化后可以仅保留该联系人的姓名,或者保留联系人的姓名及具体的工作组。例如,简化后的联系人名称可以是“Tom”,或者“Tom-人机交互”。对于在日常沟通中作用有限的工号“0522”则可以被简化。电子设备在后续播报时仅可播报简化后的联系人名称。As an example of an embodiment of this application, if the contact name is "Tom-Human-Computer Interaction Project-0522", where "Tom" is the contact person's name, "Human-Computer Interaction Project" is the contact person's work group, and 0522 may be the contact person's employee number, then the aforementioned long contact name can be simplified to retain only the contact person's name, or retain both the contact person's name and specific work group. For example, the simplified contact name could be "Tom" or "Tom-Human-Computer Interaction". The employee number "0522", which has limited use in daily communication, can be simplified. The electronic device can only broadcast the simplified contact name in subsequent announcements.

在本申请实施例的一种可能的实现方式中,电子设备可以根据当前用户与目标联系人之间的互动行为特征,判断目标联系人是否为当前用户的熟悉联系人。上述互动行为特征可以包括聊天行为特征,即根据当前用户与目标用户之间的聊天行为形成的行为特征。In one possible implementation of this application, the electronic device can determine whether the target contact is a familiar contact of the current user based on the interaction behavior characteristics between the current user and the target contact. The aforementioned interaction behavior characteristics may include chat behavior characteristics, that is, behavioral characteristics formed based on the chat behavior between the current user and the target user.

作为本申请实施例的一种示例,判断目标联系人是否为当前用户的熟悉联系人可以按照如下步骤S1-S3,通过对不同维度的特征进行赋分来进行。As an example of an embodiment of this application, determining whether a target contact is a familiar contact of the current user can be done by assigning scores to features of different dimensions in the following steps S1-S3.

S1:获取聊天行为特征。S1: Obtain chat behavior characteristics.

在本申请实施例中,可以获取多种聊天行为特征。示例性地,电子设备获取的聊天行为特征可以包括成为联系人的时长、聊天频率、对话内容主题、回复延迟以及主动性等特征中的一项或多项。上述各项特征分别被赋予的分值可以如下示例所示。In this application embodiment, various chat behavior characteristics can be acquired. For example, the chat behavior characteristics acquired by the electronic device may include one or more of the following features: duration of being a contact, chat frequency, topic of conversation, response delay, and initiative. The scores assigned to each of the above features can be illustrated in the following example.

F1:成为联系人的时长F1: Duration as a contact

<1个月:0分;<1 month: 0 points;

1-6个月:3分;1-6 months: 3 points;

6-12个月:6分;6-12 months: 6 points;

12个月以上:10分。12 months and older: 10 points.

F2:聊天频率F2: Chat frequency

每天:10分;Daily: 10 minutes;

每周几次:8分;How many times a week: 8 points;

每月几次:5分;Several times per month: 5 points;

每几个月一次:2分;Every few months: 2 points;

更少:0分。Fewer: 0 points.

F3:对话内容主题F3: Conversation Topic

主要是个人生活和情感:10分;Primarily personal life and emotions: 10 points;

既有生活也有工作:7分;A balance between life and work: 7 points;

主要是工作和礼节性对话:4分;Primarily work-related and etiquette-based conversations: 4 points;

很少涉及个人话题:0分。Rarely involves personal topics: 0 points.

F4:回复延迟F4: Response Delay

即时回复(几分钟内):10分;Instant response (within minutes): 10 points;

较快回复(1小时内):7分;Fastest response (within 1 hour): 7 points;

较慢回复(几小时内):4分;Slower response time (within a few hours): 4 points;

很少回复(超过一天):0分。Rarely responds (more than a day): 0 points.

F5:主动性F5: Initiative

双方都主动:10分;Both sides are proactive: 10 points;

大部分时间双方主动:7分;Both sides were in control for most of the time: 7 points;

单方主动:4分;Unilateral initiative: 4 points;

一方几乎从不主动:0分。One side almost never takes the initiative: 0 points.

S2:对所有有声音特征的联系人进行评分与汇总并按降序排列。S2: Rate and summarize all contacts with voice characteristics and sort them in descending order.

在本申请实施例中,任一项的聊天行为特征分别可以具有不同的权重,根据每项聊天行为特征的分值,结合相应的权重,可以计算得到联系人的熟悉程度得分。In this embodiment of the application, each chat behavior feature can have different weights. Based on the score of each chat behavior feature and its corresponding weight, the familiarity score of the contact can be calculated.

在一种示例中,每项聊天行为特征的权重可以是相同的,即上述F1-F5每项聊天行为特征的权重相同,均为20%,因此,联系人的熟悉程度得分可以按照0.2×F1+0.2×F2+0.2×F3+0.2×F4+0.2×F5来计算得到。In one example, the weight of each chat behavior feature can be the same, that is, the weight of each chat behavior feature F1-F5 above is the same, all being 20%. Therefore, the familiarity score of the contact can be calculated as 0.2×F1+0.2×F2+0.2×F3+0.2×F4+0.2×F5.

对于计算得到的全部联系人的熟悉程度得分,电子设备可以按照得分的降序对各个联系人进行排序。Given the calculated familiarity scores for all contacts, the electronic device can sort the contacts in descending order of their scores.

S3:取评分靠前的多个联系人,作为“熟悉联系人”。S3: Select the top-rated contacts as "familiar contacts".

在本申请实施例中,根据熟悉程度得分降序排列的各个联系人,电子设备可以取得分靠前的多个联系人,作为是当前用户的熟悉联系人。示例性地,可以取熟悉程度得分靠前的7个联系人,作为用户的熟悉联系人。而其他联系人则属于当前用户的非熟悉联系人。In this embodiment, based on the contacts sorted in descending order of familiarity score, the electronic device can select the top-scoring contacts as the user's familiar contacts. For example, the top 7 contacts with the highest familiarity scores can be selected as the user's familiar contacts. The other contacts are considered the user's unfamiliar contacts.

在本申请实施例的一种可能的实现方式中,电子设备接收到信息时,可以判断联系人是否属于用户熟悉的联系人。示例性地,在接收到信息时,电子设备可以查询该信息对应的目标联系人,并查询用户与该目标联系人的聊天行为特征。例如,按照上述示例的步骤,判断目标联系人是否为用户熟悉的联系人。对于熟悉联系人,电子设备在播报转换后的目标信息时,可以直接使用与该联系人声音相同或相似的声音播报信息内容,不播报联系人信息。对于不熟悉的联系人,电子设备则可以首先播报联系人信息,再播报相应的消息内容。In one possible implementation of this application embodiment, when an electronic device receives information, it can determine whether the contact is a familiar contact to the user. For example, upon receiving information, the electronic device can query the target contact corresponding to the information and query the user's chat behavior characteristics with the target contact. For instance, following the steps in the example above, it can determine whether the target contact is a familiar contact to the user. For familiar contacts, when broadcasting the converted target information, the electronic device can directly use a voice that is the same as or similar to the contact's voice to broadcast the information content, without broadcasting the contact information. For unfamiliar contacts, the electronic device can first broadcast the contact information and then broadcast the corresponding message content.

通过对联系人来源的判断,对于熟悉联系人,用户可以通过其声音判断出信息来自于谁,省略联系人的播报可以缩短整体播报时长,提高信息获取效率;对于不熟悉的联系人,电子设备在播报目标信息时首先播报联系人来源信息,可避免用户难以通过声音判断来源导致的信息理解偏差,保证了信息传递的准确性;此外,对于不熟悉的联系人,通过简化该联系人的标记信息(例如备注名等),可以提高信息的获取效率。By identifying the source of a contact, users can determine the source of information from familiar contacts by their voices. Omitting the contact's name can shorten the overall broadcast time and improve information retrieval efficiency. For unfamiliar contacts, electronic devices should first announce the contact's source information when broadcasting target information. This avoids misunderstandings caused by users' inability to determine the source by voice and ensures the accuracy of information transmission. Furthermore, simplifying the contact's tagging information (such as nicknames) can improve information retrieval efficiency for unfamiliar contacts.

在本申请实施例的一种可能的实现方式中,电子设备进行联系人来源判断的过程可以通过联系人来源判断模块实现。In one possible implementation of this application embodiment, the process of determining the source of a contact in an electronic device can be implemented by a contact source determination module.

在信息精简判断步骤中,电子设备可以根据文本内容或转换成文本内容的信息的预计播报时长,决定是否进一步对信息进行精简。In the information simplification judgment step, the electronic device can decide whether to further simplify the information based on the expected broadcast duration of the text content or the information converted into text content.

如图23所示,是本申请实施例提供的一种信息精简判断流程示意图。在该流程中,电子设备可以预估文本内容或转换成文本内容的信息的播报时长,决定是否进一步对信息进行精简。示例性地,电子设备预估当前接收到的信息如果采用语音的方式进行播报,播报时长可能超过20秒,此时电子设备可以对该信息作进一步的精简。例如,对信息中的文本内容进行简化,如减少介词、主谓宾句子结构、简洁说法替换、删除引见性文字、考虑语境的内容弱化等。Figure 23 illustrates a flowchart of an information simplification judgment process provided in an embodiment of this application. In this process, the electronic device can estimate the playback duration of the text content or information converted into text content and decide whether to further simplify the information. For example, if the electronic device estimates that the playback duration of the currently received information, if broadcast via voice, may exceed 20 seconds, then the electronic device can further simplify the information. For instance, it can simplify the text content of the information by reducing prepositions, subject-verb-object sentence structures, replacing phrases with concise ones, deleting introductory text, and weakening content based on context.

作为本申请实施例的一种示例,电子设备接收到的原始信息可以是如下信息:As an example of an embodiment of this application, the raw information received by the electronic device may be the following:

“嘿,老朋友,好久不见!最近怎么样?前段时间我去了趟云南,真是太美了!我们去了丽江、大理和香格里拉,每个地方都有自己独特的魅力。丽江的古城真是让人流连忘返,大理的洱海也很美,我们租了自行车环湖骑行,虽然有点累,但看到那碧蓝的湖水和周围的田园风光,感觉一切都值得了。"Hey, old friend, long time no see! How have you been? I recently went to Yunnan, it was absolutely beautiful! We visited Lijiang, Dali, and Shangri-La, each place with its own unique charm. The ancient city of Lijiang was truly unforgettable, and Erhai Lake in Dali was also stunning. We rented bicycles and cycled around the lake. Although it was a bit tiring, seeing the azure water and the surrounding pastoral scenery made it all worthwhile."

对了,还有一件重要的事要跟你说,我打算换工作了。之前的工作虽然不错,但是感觉没有什么发展的空间。我最近面试了一家互联网企业,职位是产品经理,工作内容听起来挺有挑战性也很有趣。Oh, and there's something else important I need to tell you: I'm planning to change jobs. My previous job was good, but I felt there wasn't much room for growth. I recently interviewed at an internet company for a product manager position, and the job sounds quite challenging and interesting.

另外,我们班的同学聚会你还记得吗?时间定在下个月的第一个周末,大家都很期待见到你呢!这次可一定要来呀!”Also, do you remember our class reunion? It's scheduled for the first weekend of next month, and everyone is really looking forward to seeing you! You absolutely have to come this time!

电子设备通过信息精简判断,确定上述信息超过预设的播报时长,应当对其内容进行精简。对内容的精简可以是对内容进行概括、保留原始内容中的关键或重要信息,删减连接词或重复、无实际意义的词语等处理。因此,在按照一定规则对上述原始信息进行精简后得到的信息可以是如下信息:Electronic devices, through information simplification judgment, determine that the aforementioned information exceeds the preset broadcast duration and should therefore simplify its content. Simplification can involve summarizing the content, retaining key or important information from the original content, and removing connecting words, repetitions, or meaningless words. Therefore, the information obtained after simplifying the original information according to certain rules can be as follows:

“嘿,老朋友,好久不见!最近怎么样?我刚去了云南,丽江、大理和香格里拉都很美,特别是大理的洱海,骑行环湖风景超赞。"Hey, old friend, long time no see! How have you been? I just went to Yunnan. Lijiang, Dali, and Shangri-La are all beautiful, especially Erhai Lake in Dali. Cycling around the lake is amazing."

对了,我准备换工作了,最近面试了一家互联网企业,职位是产品经理,希望能有好消息。By the way, I'm planning to change jobs. I recently interviewed with an internet company for a product manager position. I hope to get good news.

另外,下个月第一个周末我们班同学聚会,大家都期待见到你,这次一定要来呀!”Also, our class reunion is next month's first weekend, and everyone is looking forward to seeing you. You absolutely have to come!

在本申请实施例的一种可能的实现方式中,当电子设备语音播报的信息不是原始信息时,为了方便用户知道播报的信息是精简后的信息,电子设备在播报精简后的信息前可以加入提示音效,即通过首先播放提示音效提前告知用户,接下来所播报的信息并非接收到的原始信息的内容,而是经过一定处理后得到的信息。通过信息精简判断处理,避免了冗长的信息的播报时间过长,导致用户不耐烦以及长久地占据用户注意力的问题;通过替换消息内的非自明信息,避免了直接播报这些无意义信息对用户的困扰;加入提示音效,方便用户了解已对收到的信息的进行了修改。In one possible implementation of this application, when the information broadcast by the electronic device is not the original information, to help the user understand that the broadcast information is a simplified version, the electronic device can add a prompt sound effect before broadcasting the simplified information. That is, by playing the prompt sound effect first, the user is informed in advance that the information to be broadcast is not the content of the received original information, but rather information obtained after certain processing. This information simplification and processing avoids the problem of excessively long broadcast times, which can cause user impatience and occupy their attention for an extended period. Replacing non-self-evident information within the message avoids directly broadcasting meaningless information that could annoy the user. Adding a prompt sound effect helps the user understand that the received information has been modified.

在本申请实施例的一种可能的实现方式中,电子设备的信息精简判断过程,可以通过信息精简判断模块实现。In one possible implementation of this application embodiment, the information simplification judgment process of the electronic device can be implemented by an information simplification judgment module.

在非文本信息处理步骤中,电子设备可以根据非文本信息的类型进行内容理解,决定如何对非文本信息进行文本化转换。In the non-text information processing step, the electronic device can understand the content based on the type of non-text information and decide how to convert the non-text information into text.

如图24所示,是本申请实施例提供的一种非文本信息处理流程示意图。在该流程中,电子设备可以根据非文本信息的类型进行内容理解,决定文本化转换的方法。Figure 24 illustrates a non-text information processing flow provided in an embodiment of this application. In this flow, the electronic device can understand the content based on the type of non-text information and determine the method for text conversion.

图24中所示的非文本信息可以包括链接、图片、文件、表情包、文章推送、小程序、卡片信息等所有不是纯文本类型的信息。对于接收到的非文本信息,电子设备可以依据信息类型选择相应的过渡语,上述过渡语可以是与特定类型的非文本信息相适配的语句。示例性地,在接收到的非文本信息为文章推送时,与文章推送这一类型的非文本信息相适配的过渡语可以是主谓宾句式的语句,即主语+谓语+宾语格式的句式,如“我给你分享了一篇文章推送”,也可以是谓宾句式的语句,即谓语+宾语格式的句式,如“给你分享了一篇文章推送”。相较于主谓宾句式的过渡语,谓宾句式的语句省略了主语。在一种可能的实现方式中,使用省略主语的谓宾句式的过渡语可以使用在主语明确的场景中。例如,在用户与联系人一对一私聊的场景中。而在多人聊天的场景中,例如群聊场景,则可以选择使用主谓宾句式的过渡语,从而方便用户快速确定信息来源。例如,在群聊场景中,联系人张三在群聊中分享了一篇文章推送,则在对该类型的非文本信息进行处理后,电子设备可以使用主谓宾句式的过渡语,形成“Tom给你们分享了一篇推送”的语音信息。除使用上述主谓宾句式或谓宾句式的过渡语外,还可以根据实际需要采用其他句式的过渡语,例如主谓句式即主语+谓语格式的语句,或者主谓宾补即主语+谓语+宾语+补语格式的句式等,本申请实施例对此不作限定。The non-text information shown in Figure 24 can include links, images, files, emojis, article pushes, mini-programs, card information, and all other information that is not plain text. For received non-text information, the electronic device can select an appropriate transition phrase based on the information type. This transition phrase can be a statement adapted to a specific type of non-text information. For example, when the received non-text information is an article push, the transition phrase adapted to this type of non-text information can be a subject-verb-object sentence, i.e., a subject + verb + object format, such as "I shared an article push with you," or a verb-object sentence, i.e., a verb + object format, such as "I shared an article push with you." Compared to subject-verb-object transition phrases, verb-object sentences omit the subject. In one possible implementation, using a verb-object transition phrase with an omitted subject can be used in scenarios where the subject is explicit. For example, in a one-on-one private chat between a user and a contact. In multi-person chat scenarios, such as group chats, a subject-verb-object transition phrase can be used to help users quickly identify the source of the information. For example, in a group chat scenario, if contact Zhang San shares an article in the group chat, after processing this type of non-text information, the electronic device can use a subject-verb-object transition phrase to form the voice message "Tom shared an article with you." Besides using the aforementioned subject-verb-object or verb-object transition phrases, other sentence structures can be used as needed, such as subject-verb sentences (subject + verb format) or subject-verb-object-complement formats (subject + verb + object + complement format), etc. This application embodiment does not limit these variations.

如图24所示,电子设备在处理接收到的非文本信息时,可以根据信息所处的场景来添加相应的衔接语。示例性地,可以根据信息是否在群聊中,添加衔接语“你”或者“你们”。其中,当信息为群聊中的信息时,衔接语可以为“你们”,而在私聊中,衔接语可以为“你”。从而形成上述示例中“给你分享了一篇文章推送”或“Tom给你们分享了一篇推送”的语音信息。As shown in Figure 24, when processing received non-text information, electronic devices can add appropriate transition phrases based on the context of the information. For example, the transition phrase "you" or "you all" can be added depending on whether the information is in a group chat. Specifically, "you all" can be used when the information is in a group chat, while "you" can be used in a private chat. This results in voice messages like "I shared an article with you" or "Tom shared an article with you all" in the example above.

在本申请实施例的一种可能的实现方式中,非文本信息的类型可以包括分享链接类、文件类、小程序类,以及交易信息类等。上述各种类型的非文本信息对应的过渡语及相应的转换策略可以如下表一所示。In one possible implementation of this application, the types of non-text information may include shareable links, files, mini-programs, and transaction information, etc. The transition phrases and corresponding conversion strategies for these various types of non-text information are shown in Table 1 below.

表一、各种类型的非文本信息对应的过渡语及相应的转化策略的示例
Table 1. Examples of transition phrases and corresponding conversion strategies for various types of non-textual information.

非文本信息可以是文件或链接。对于文件或链接形式的信息,电子设备可以推送链接对应的内容的摘要,形成可用于语音播报的信息;或者依据文件扩展名对应的规则,模拟打开该文件并对文件中的内容进行提取,总结为摘要,形成可用于语音播报的信息;又或者,电子设备还可以模拟打开文件或链接,结合上下文及文件名对内容进行概括,形成可用于语音播报的信息。Non-text information can be files or links. For information in the form of files or links, electronic devices can push a summary of the content corresponding to the link to form information that can be used for voice broadcast; or, based on the rules corresponding to the file extension, simulate opening the file and extracting the content from the file, summarizing it into a summary to form information that can be used for voice broadcast; or, electronic devices can also simulate opening the file or link, summarizing the content in combination with the context and file name to form information that can be used for voice broadcast.

作为本申请实施例的一种示例,如果非文本信息为链接,则电子设备可以模拟打开该链接对应的网页,并提取或概括网页中的内容,生成相应的可播报的信息,并使用与联系人的真实声音相同或相似的声音播报该信息。例如,在接收到链接形式的非文本信息时,按照本方法处理后播报的语音信息可以是“Jack推送了一个全国研究生创新大赛的投票链接,链接中包括进入全国十佳的创新作品,投票的截止时间为7月10日24点”。As an example of an embodiment of this application, if the non-text information is a link, the electronic device can simulate opening the webpage corresponding to the link, extract or summarize the content of the webpage, generate corresponding broadcastable information, and broadcast the information using a voice that is the same as or similar to the contact's real voice. For example, when receiving non-text information in the form of a link, the voice information broadcast after processing according to this method could be: "Jack pushed a voting link for the National Postgraduate Innovation Competition, which includes the top ten innovative works in the country. The deadline for voting is 24:00 on July 10th."

结合表一所示的示例,当接收到的信息包含链接类型的非文本信息时,电子设备在对信息进行处理时,可以获取链接对应内容的摘要。示例性地,电子设备接收到的信息为:“我找到一篇相关的文献:doi:10.xxx/3491102.xxxx,觉得和你的研究挺相关的,你看看对你有没有帮助”,上述信息中包含链接类型的非文本信息,即上述信息中的“doi:10.xxx/3491102.xxxx”部分。电子设备在对上述信息进行处理时,可以推送链接对应的内容的摘要。因此,基于上述包含链接的信息进行处理所得到的目标信息可以是:“我找到一篇相关的文献:标题是语音助手响应行为比较,觉得和你的研究挺相关的,你看看对你有没有帮助。”也即,电子设备将前述信息中的链接部分替换为了“标题是语音助手响应行为比较”。Referring to the example shown in Table 1, when the received information contains non-text information of the link type, the electronic device can obtain a summary of the content corresponding to the link when processing the information. For example, the information received by the electronic device is: "I found a relevant paper: doi:10.xxx/3491102.xxxx, which I think is quite relevant to your research. See if it's helpful to you." This information contains non-text information of the link type, namely the "doi:10.xxx/3491102.xxxx" part. When processing this information, the electronic device can push a summary of the content corresponding to the link. Therefore, the target information obtained by processing the information containing the link could be: "I found a relevant paper: the title is 'Comparison of Voice Assistant Response Behaviors,' which I think is quite relevant to your research. See if it's helpful to you." That is, the electronic device replaces the link part in the aforementioned information with "the title is 'Comparison of Voice Assistant Response Behaviors.'"

在另一种示例中,如果非文本信息为文件,例如word格式的文件,则电子设备可以提取该文件的文件名,并播报该文件名,如“Tom发送了一份文件名为考勤说明的word文件”。或者,电子设备也可以模拟打开该文件,提取文件中的摘要或主要内容,并播报提取出的摘要或文件主要内容,如“Tom发送了一份word文件,文件中的主要内容是最新修订的公司考勤说明”。In another example, if the non-text information is a file, such as a Word document, the electronic device can extract the filename and announce it, such as "Tom sent a Word document named 'Attendance Instructions'." Alternatively, the electronic device can simulate opening the file, extract a summary or main content, and announce the extracted summary or main content, such as "Tom sent a Word document containing the latest revised company attendance instructions."

在本申请实施例的一种可能的实现方式中,为了方便用户了解播报的内容是经过改写的,并非是联系人发送的信息的原始内容,在播报涉及改写部分的内容前,电子设备可以加入相应的提示音效。例如,在播报上述改写后的“我找到一篇相关的文献:标题是语音助手响应行为比较,觉得和你的研究挺相关的,你看看对你有没有帮助。”这一信息时,可以在播报“标题是语音助手响应行为比较”前加入提示音效,以告知用户接下来播报的内容属于改写的内容。In one possible implementation of this application, to help users understand that the content being broadcast has been rewritten and is not the original content of the message sent by the contact, the electronic device can add a corresponding prompt sound effect before broadcasting the rewritten content. For example, when broadcasting the rewritten message, "I found a relevant article: the title is 'Comparison of Voice Assistant Response Behaviors,' which I think is quite relevant to your research. See if it can help you," a prompt sound effect can be added before broadcasting "The title is 'Comparison of Voice Assistant Response Behaviors'" to inform the user that the content to be broadcast next is rewritten.

在本申请实施例的另一种可能的实现方式中,对于不同类型的非文本信息进行的改写,电子设备在播报时加入的提示音效可以是不同的。示例性地,在对接收到的分享链接类的非文本信息进行改写并播报时加入的提示音效,与对接收到的文件类或小程序类非文本信息进行改写并播报时加入的提示音效不同。In another possible implementation of this application, the prompt sound effects added by the electronic device when broadcasting different types of non-text information can be different. For example, the prompt sound effects added when rewriting and broadcasting received non-text information such as share links are different from the prompt sound effects added when rewriting and broadcasting received non-text information such as files or mini-programs.

电子设备在对处理后得到的目标信息进行播报时,可以采用与发送该消息的联系人的真实声音相同或相似的声音播报。例如,使用与联系人Tom声音相同或相似的声音播报“Tom给你们分享了一篇推送”的语音信息。When an electronic device broadcasts the processed target information, it can use a voice that is the same as or similar to the real voice of the contact who sent the message. For example, it can use a voice that is the same as or similar to the voice of the contact Tom to broadcast the voice message "Tom shared a push notification with you".

如图25所示,是本申请实施例提供的一种非文本信息处理的示例。在图25所示的示例中,联系人Tom发送的信息为一篇推送的文章,即图25中的(a)中示出的推文2501,该推文2501的标题为“2024音箱耳机展看点”。经过电子设备的处理,可以形成与该推送内容对应的可用于语音播报的信息。并且,针对信息场景的不同,语音播报的内容可以不同。例如,在一对一的聊天场景中,也即私聊场景中,对联系人推送的内容进行处理后,语音播报的内容可以是“我给你分享了篇推送”;而在多人聊天场景中,也即群聊场景中,语音播报的内容可以是“我给你们分享了篇推送”。Figure 25 illustrates an example of non-text information processing provided in this application embodiment. In the example shown in Figure 25, the information sent by contact Tom is a push notification article, namely tweet 2501 shown in Figure 25(a), with the title "Highlights of the 2024 Speaker and Headphone Exhibition". After processing by the electronic device, information corresponding to the push notification content can be generated and can be used for voice broadcast. Furthermore, the content of the voice broadcast can vary depending on the information scenario. For example, in a one-on-one chat scenario, i.e., a private chat scenario, after processing the content pushed by the contact, the voice broadcast content could be "I shared a push notification with you"; while in a multi-person chat scenario, i.e., a group chat scenario, the voice broadcast content could be "I shared a push notification with you".

以此,在图25所示的示例中,对图25中的(a)中示出的推文2501进行处理后形成的可用于语音播报的文本信息可以如图25中的(b)所示。如果上述推文2501为私人会话中的信息,则处理后得到的信息可以如图25中的(b)中的信息2502所示,即为“我给你分享了一篇推送:2024音箱耳机展看点”;如果上述推文2501为群组会话即群聊中的信息,则处理后得到的信息可以如图25中的(b)中的信息2503所示,即为“我给你们分享了一篇推送:2024音箱耳机展看点”。Thus, in the example shown in Figure 25, the text information that can be used for voice broadcast after processing the tweet 2501 shown in Figure 25(a) can be as shown in Figure 25(b). If the tweet 2501 is information in a private conversation, the processed information can be as shown in information 2502 in Figure 25(b), which is "I shared a push with you: Highlights of the 2024 Speaker and Headphone Exhibition"; if the tweet 2501 is information in a group conversation, i.e., a group chat, the processed information can be as shown in information 2503 in Figure 25(b), which is "I shared a push with you: Highlights of the 2024 Speaker and Headphone Exhibition".

通过电子设备对非文本信息的处理,可以为非文本信息添加过渡语,降低了直接使用联系人声音播报的突兀感,补充了背景信息,增强了用户对信息的理解;考虑联系人发送信息的语境(例如私聊还是群聊),使过渡语更符合当前的语境,避免了语境与过渡语的不一致性。By processing non-text information through electronic devices, transitional phrases can be added to non-text information, reducing the abruptness of directly using the contact's voice to read the message, supplementing background information, and enhancing the user's understanding of the information; considering the context in which the contact sent the message (such as private chat or group chat), the transitional phrases are made more in line with the current context, avoiding inconsistencies between the context and the transitional phrases.

在即时信息合并处理步骤中,电子设备可以根据相邻信息的接收时间间隔、信息数量以及是否有正在播报的信息,决定是否对多条信息进行合并以及合并方式。In the real-time information merging process, the electronic device can decide whether to merge multiple messages and the merging method based on the receiving time interval between adjacent messages, the number of messages, and whether there is any message being broadcast.

如图26所示,是本申请实施例提供的一种即时信息合并处理流程的示意图。按照图26所示的流程,当电子设备接收到一条新的信息时,可以判断是否存在已有信息在播报流程中且未结束播报。如果当前并没有在先接收到的信息在进行播报,则对于此时新接收到的信息,电子设备可以按照正常播报流程进行播报,上述正常播报流程可以包括前述各个实施例中介绍的各个流程,如图1中确定目标联系人、检索声音特征及权限管理,以及播报控制策略生成等流程。Figure 26 illustrates a real-time information merging process provided in an embodiment of this application. According to the process shown in Figure 26, when an electronic device receives a new message, it can determine whether there is already any message in the broadcasting process that has not yet ended. If there is no previously received message being broadcast, the electronic device can broadcast the newly received message according to the normal broadcasting process. This normal broadcasting process can include the processes described in the preceding embodiments, such as the processes for determining the target contact, retrieving voice features and managing permissions, and generating broadcast control strategies as shown in Figure 1.

如果电子在接收到新信息时,正在播报在先接收到的信息,则电子设备可以判断新信息与播报中的信息是否存在共同点。例如,判断信息是否来自同一联系人,信息内容是否相同或相似等。如果新信息与播报中的信息并不存在共同点,则电子设备可以采用正常播报流程对新信息进行播报。如果新信息与正在播报中的信息存在共同点,则电子设备对未播报的信息进行汇总,通过信息精简判断处理流程得到待播报的信息。上述未播报的信息包括本次接收到的信息以及在先接收到但并未完成播报的信息。If an electronic device receives new information while simultaneously broadcasting previously received information, it can determine if the new information shares any similarities with the broadcast information. For example, it might determine if the information comes from the same contact person or if the content is identical or similar. If the new information does not share any similarities with the broadcast information, the electronic device can broadcast the new information using the normal broadcast process. If the new information shares similarities with the currently broadcast information, the electronic device summarizes the unbroadcast information and uses an information simplification and judgment process to determine the information to be broadcast. This unbroadcast information includes both currently received information and previously received but not yet broadcast information.

如图27和图28所示,是即时信息合并的两种示例。在图27所示的群聊场景中,多个联系人分别发送的多条雷同信息,通过合并处理,可以将多条雷同信息合并为一条信息,形成相应的语音播报信息。Figures 27 and 28 show two examples of real-time message merging. In the group chat scenario shown in Figure 27, multiple identical messages sent by multiple contacts can be merged into a single message through merging processing, forming a corresponding voice broadcast message.

具体地,图27中的(a)所示的示例中,联系人Tom在工作沟通群中发出了红包2701。此后,其他的多个联系人发送了信息。例如,图27中联系人Mike发送了一条“谢谢Tom的红包”的信息2702,联系人Lucy和Bella则分别发送了表情包,即图27中的信息2703和2704,用于表示对Tom发出的红包2701的感谢。电子设备通过判断可以确定联系人Mike、Lucy和Bella所发送的信息均是表达对Tom发出的红包2701的感谢。因此,上述联系人Mike、Lucy和Bella可以认为是关联联系人。电子设备对上述信息2702、2703和2704进行判断处理后,可以将上述多个关联联系人发送的信息合并为图27中的(b)所示的信息2705,即“大家纷纷在感谢Tom的红包”这一条信息。Specifically, in the example shown in Figure 27(a), contact Tom sent a red envelope 2701 in the work communication group. Subsequently, several other contacts sent messages. For example, contact Mike sent a message 2702 saying "Thank you for the red envelope, Tom," while contacts Lucy and Bella sent emojis, i.e., messages 2703 and 2704 in Figure 27, to express their gratitude for Tom's red envelope 2701. The electronic device can determine that the messages sent by contacts Mike, Lucy, and Bella all express gratitude for Tom's red envelope 2701. Therefore, the aforementioned contacts Mike, Lucy, and Bella can be considered as related contacts. After processing the messages 2702, 2703, and 2704, the electronic device can merge the messages sent by the multiple related contacts into message 2705 shown in Figure 27(b), i.e., "Everyone is thanking Tom for the red envelope."

在图28所示的聊天场景中,如图28中的(a)所示,同一联系人Lucy在短时间内发送了多条信息,即信息2801-2804,通过对上述多条信息进行归纳、概括或合并,可以形成一条完整的语音播报信息。例如,形成如图28中的(b)所示的信息2805,该信息2805即是对图28中的(a)中的多条信息2801-2804进行合并、概括后得到的。In the chat scenario shown in Figure 28, as shown in Figure 28(a), the same contact Lucy sends multiple messages in a short period of time, namely messages 2801-2804. By summarizing, generalizing, or merging these multiple messages, a complete voice broadcast message can be formed. For example, message 2805, as shown in Figure 28(b), is obtained by merging and summarizing the multiple messages 2801-2804 in Figure 28(a).

通过电子设备对即时信息的合并处理,当同一联系人或关联联系人在短时间内发送了多条信息时,通过汇总合并操作,能够避免播放信息与信息之间的提示音带来的时间消耗,提高了信息获取效率;将同一联系人或关联联系人的信息汇总在一起,可以便于用户对信息的全面理解,减少了遗漏对信息理解的影响。By merging real-time information through electronic devices, when multiple messages are sent by the same or related contacts within a short period of time, the time consumed by playing notification sounds between messages can be avoided through the aggregation and merging operation, thus improving the efficiency of information acquisition. By aggregating the information from the same or related contacts together, users can better understand the information and reduce the impact of omissions on information comprehension.

在本申请实施例的一种可能的实现方式中,对于非即时信息,电子设备可以通过信息检索与摘要汇总等步骤的处理,根据用户指令提取信息需求特征,生成检索式筛选信息,对缺失的特征进行隐式补全,相应地摘要播报信息,并可触发主动提问。In one possible implementation of this application, for non-real-time information, the electronic device can extract information demand features according to user instructions through steps such as information retrieval and summary summarization, generate search-based filtered information, implicitly complete missing features, broadcast information summaries accordingly, and trigger active questioning.

如图29所示,是本申请实施例提供的一种信息检索与摘要汇总处理流程示意图,该流程示出了对非即时信息进行处理的相关步骤。如图29所示,上述流程可以基于用户请求获取信息的指令触发。在一种示例中,用户发出的上述指令可以是语音指令。示例性地,用户可以通过语音,主动向电子设备发送指令,请求获取相关信息。电子设备在接收到用户的指令后,可以提取用户请求中的需求特征,该需求特征可以用于表示用户所请求获取的信息的具体内容。电子设备可以通过判断请求中的需求特征是否全面,来执行后续的信息检索与摘要汇总处理流程。如果从用户请求中可以提取出全面或完整的需求特征,电子设备可以根据所提取出的特征,在相应范围内进行检索并筛选出满足上述需求特征的一条或多条信息。如果从用户请求中提取出的需求特征并不全面或者并不完整,无法直接根据提取出的需求特征检索及筛选信息,电子设备可以根据其他相关信息对缺失的特征进行补全。例如,电子设备可以根据近期接收到的信息,来对缺失的特征进行补全,得到全面或完整的需求特征,从而在相应范围内进行检索并筛选出满足上述需求特征的一条或多条信息。Figure 29 illustrates an information retrieval and summary processing flow provided in this embodiment of the application. This flow shows the relevant steps for processing non-real-time information. As shown in Figure 29, the above flow can be triggered based on a user's instruction to obtain information. In one example, the user's instruction can be a voice command. For instance, a user can actively send an instruction to an electronic device via voice to request relevant information. After receiving the user's instruction, the electronic device can extract the demand features from the user's request. These demand features can represent the specific content of the information requested by the user. The electronic device can determine whether the demand features in the request are comprehensive to execute the subsequent information retrieval and summary processing flow. If comprehensive or complete demand features can be extracted from the user's request, the electronic device can search and filter one or more pieces of information that meet the aforementioned demand features within the corresponding range based on the extracted features. If the demand features extracted from the user's request are not comprehensive or complete, and information cannot be directly retrieved and filtered based on the extracted demand features, the electronic device can supplement the missing features based on other relevant information. For example, electronic devices can complete missing features based on recently received information to obtain comprehensive or complete requirements features, thereby retrieving and filtering one or more pieces of information that meet the above requirements features within the corresponding range.

如图29所示,在筛选出一条或多条信息后,电子设备可以判断筛选出的信息中是否包括非文本信息。对于非文本信息,电子设备可以将其转换为文本信息。电子设备将非文本信息转换为文本信息的过程可以参见本申请前述各个实施例中的相关介绍,在此不再赘述。As shown in Figure 29, after filtering out one or more pieces of information, the electronic device can determine whether the filtered information includes non-text information. For non-text information, the electronic device can convert it into text information. The process of converting non-text information into text information by the electronic device can be found in the relevant descriptions in the foregoing embodiments of this application, and will not be repeated here.

对于筛选出的文本信息,以及经转换后得到的文本信息,电子设备可以对这些信息进行汇总处理,并概括出相关的摘要。在此基础上,电子设备还可以判断接收到的非即时信息中是否存在与汇总得到的摘要相关联的重要信息。如果存在,电子设备可以通过主动向用户提问的方式,根据用户的反馈确认是否需要将关联的重要信息一并播报给用户。电子设备在向用户播报信息后,还可以接收用户针对所播报信息进一步的提问,从而实现电子设备与用户之间的互动。The electronic device can summarize and extract relevant summaries from the selected and converted text information. Based on this, it can also determine whether any important information related to the summarized information exists in the received non-real-time information. If so, the electronic device can proactively ask the user questions to confirm whether the relevant important information should be broadcast to the user as well. After broadcasting the information, the electronic device can also receive further questions from the user regarding the broadcasted information, thus enabling interaction between the electronic device and the user.

如图30所示,是本申请实施例提供的一种信息检索与摘要汇总的示例。图30示出了基于语义理解检索到特定范围的消息并汇总摘要的示例。Figure 30 illustrates an example of information retrieval and summary summarization provided in this application. Figure 30 shows an example of retrieving messages within a specific range based on semantic understanding and summarizing them.

具体地,图30所示的场景为群聊场景,在图30中的(a)所示的工作沟通群中,联系人Tom发送了多条信息3001-3003,其中,信息3001和3002为联系人Tom转发的两篇推文,信息3003则是该联系人Tom发送的一条文本信息。在电子设备接收到上述多条信息后,用户可以主动询问电子设备联系人Tom发送的消息的内容。例如,如图30中的(b)中所示的,用户可以通过语音的形式询问电子设备“Tom在工作沟通群里说了什么?”针对用户的上述请求,电子设备可以提取出相关的需求特征,包括联系人Tom、工作沟通群等。基于提取出的需求特征,电子设备可以在相关范围内进行检索,即在工作沟通群中检索联系人Tom发送的信息。电子设备检索得到的结果即为图30中的(a)所示的联系人Tom发送了多条信息3001-3003。此时,电子设备可以通过对上述多条信息进行处理,形成如图30中的(b)中信息3005所示的待播报的目标信息。电子设备采用与联系人Tom的真实声音相同或相似的目标声音对信息3005进行语音播报。Specifically, Figure 30 shows a group chat scenario. In the work communication group shown in Figure 30(a), contact Tom sent multiple messages 3001-3003. Messages 3001 and 3002 are two tweets forwarded by contact Tom, and message 3003 is a text message sent by contact Tom. After the electronic device receives these multiple messages, the user can actively ask the electronic device about the content of the messages sent by contact Tom. For example, as shown in Figure 30(b), the user can ask the electronic device via voice, "What did Tom say in the work communication group?" In response to the user's request, the electronic device can extract relevant demand features, including contact Tom, the work communication group, etc. Based on the extracted demand features, the electronic device can perform a search within the relevant scope, that is, search for the information sent by contact Tom in the work communication group. The result obtained by the electronic device is the multiple messages 3001-3003 sent by contact Tom as shown in Figure 30(a). At this time, the electronic device can process the above multiple messages to form the target information to be broadcast, as shown in message 3005 in Figure 30(b). The electronic device uses a target voice that is the same as or similar to the real voice of the contact Tom to broadcast the message 3005.

在本申请实施例的一种可能的实现方式中,当电子设备从用户的请求中提取出的需求特征并不全面或者并不完整时,电子设备可以对不全面或不完整的需求进行补全。In one possible implementation of this application, when the requirement features extracted by the electronic device from the user's request are not comprehensive or complete, the electronic device can supplement the incomplete or incomplete requirements.

作为本申请实施例的一种示例,例如用户可以主动询问电子设备“Jack上午说了什么”。电子设备在对上述请求进行处理后,可以知道用户希望获取的信息是联系人Jack上午发送的信息。上述需求并不全面,电子设备通过补全需求特征可以确定检索联系人Jack上午发送的信息的范围可以群聊中的信息,以及私聊中的信息。因此,在上述示例中,电子设备按照补全后的需求特征进行检索并得到的信息可以包括联系人Jack在周六团建群中发送的信息“周六团建的目的地是大梅沙”以及联系人Jack单独给用户发送的信息“晚上有空一起吃饭吗?”。对于上述两条信息,电子设备最终处理得到的待播报的目标信息可以是文本“我在周六团建群中给大家讲了团建的目的地,去大梅沙。我还单独给你发了条消息,问你晚上有空一起吃饭吗?”该文本信息可以采用与联系人Jack的真实声音相同或相似的声音来播报。As an example of an embodiment of this application, a user can proactively ask an electronic device, "What did Jack say this morning?" After processing the request, the electronic device can determine that the user wants to obtain information sent by contact Jack this morning. This request is not comprehensive; by supplementing the request features, the electronic device can determine the scope of information retrieved from contact Jack this morning, including information in group chats and private chats. Therefore, in the above example, the information retrieved by the electronic device according to the supplemented request features can include the information sent by contact Jack in the Saturday team-building group, "The destination for Saturday's team-building is Dameisha," and the information sent by contact Jack individually to the user, "Are you free to have dinner together tonight?" For these two pieces of information, the final target information to be broadcast by the electronic device can be the text, "I told everyone in the Saturday team-building group that the destination is Dameisha. I also sent you a separate message asking if you're free to have dinner together tonight?" This text information can be broadcast using a voice that is the same as or similar to contact Jack's real voice.

通过信息检索与摘要汇总处理,对非即时信息,电子设备可以通过语义理解获取用户感兴趣的信息的范围,对范围内的信息进行汇总操作,避免逐条播报耗时、手动检索浏览繁琐的问题,提高了用户获取信息的效率。By using information retrieval and summary processing, electronic devices can use semantic understanding to obtain the range of information that users are interested in for non-real-time information. They can then summarize the information within that range, avoiding the time-consuming process of reading each item and the tedious process of manual searching and browsing, thus improving the efficiency of users in obtaining information.

应用本申请,通过减少不必要的信息来源和联系人名称播报,可以在避免歧义的前提下减少播报时间,并通过汇总多条信息、添加过渡语、替换无意义信息和精简冗长信息的播报,提高信息传递的效率和准确性,优化和提升了用户听的体验。对非即时信息,通过检索用户需要信息的范围并汇总,避免逐条播报耗时、手动检索浏览繁琐等问题,提高了用户获取信息的效率。尽可能地使用与联系人相同或相似的声音播报信息,可以拉进用户与信息的心理距离,帮助用户更准确地理解信息,提高用户的好感度。By applying this application and reducing unnecessary information sources and contact name announcements, broadcast time can be reduced while avoiding ambiguity. Furthermore, by summarizing multiple messages, adding transitional phrases, replacing meaningless information, and streamlining lengthy announcements, the efficiency and accuracy of information delivery are improved, thus optimizing and enhancing the user's listening experience. For non-real-time information, by retrieving and summarizing the information needed by the user, the time-consuming process of reading each message individually and the tedious manual browsing are avoided, improving the efficiency of information retrieval for users. Using a voice that is the same as or similar to the contact person's when broadcasting information can bridge the psychological distance between the user and the information, helping the user understand the information more accurately and increasing user satisfaction.

作为本申请实施例的一种具体的应用示例,本方法还可以应用于无障碍阅读场景中。示例性地,当电子设备接收到相关用户的某一联系人发来的信息时,电子设备可以通过执行本申请实施例提供的方法,将信息转换为目标信息,并使用通过声音合成或克隆得到的与该联系人相同或相似的声音播报上述目标信息,方便相关用户及时获取信息,达到闻声识人的效果。这样,相关用户可以根据播报语音信息使用的声音数据,直观地确定信息来源,快速分辨发送信息的联系人是谁。在一种示例中,上述相关用户可以包括盲人用户。在另一示例中,电子设备在将接收到的信息转换为目标信息时,可以按照前述介绍的播报控制策略的相关内容,对信息进行处理,提高信息播报的效率。As a specific application example of this application, this method can also be applied to accessible reading scenarios. For example, when an electronic device receives a message from a contact of a relevant user, the electronic device can execute the method provided in this application to convert the message into target information and broadcast the target information using a voice synthesized or cloned that is identical or similar to that of the contact person. This facilitates timely information acquisition for the relevant user, achieving the effect of recognizing the person by their voice. In this way, the relevant user can intuitively determine the source of the information and quickly identify the contact person who sent the message based on the voice data used in the broadcast. In one example, the relevant user can include a blind user. In another example, when converting the received information into target information, the electronic device can process the information according to the aforementioned broadcast control strategy to improve the efficiency of information broadcasting.

作为本申请实施例的另一种具体的应用示例,本方法还可以应用于使用智能家居的场景中。As another specific application example of this application, this method can also be applied to scenarios using smart homes.

在智能家居场景的一种示例中,使用电子设备的用户可以是儿童用户或未成年人用户,本方法可以应用于儿童模式下的电子设备控制过程。示例性地,儿童用户可以使用电子设备进行学习、游戏或其他娱乐活动。儿童模式下的儿童用户使用电子设备的信息可以被发送至父母或该儿童用户的其他监护人。父母或其他监护人可以根据实际需要向该儿童用户发送消息。例如,在儿童用户使用电子设备超过一定时长后,父母或其他监护人可以接收到相关提示信息。这样,父母或其他监护人可以向儿童用户发送消息,以提醒该儿童用户停止使用电子设备。此时,接收到消息的电子设备可以通过执行本申请实施例提供的方法,使用与发送消息的父母或某一监护人相同或相似的声音,语音播报消息,提醒儿童用户停止使用电子设备。In one example of a smart home scenario, the user of the electronic device can be a child or a minor. This method can be applied to the control process of the electronic device in child mode. For example, a child user can use the electronic device for learning, playing games, or other entertainment activities. Information about a child user's use of the electronic device in child mode can be sent to their parents or other guardians. Parents or other guardians can send messages to the child user as needed. For example, after a child user has used the electronic device for a certain period of time, parents or other guardians can receive relevant prompts. Thus, parents or other guardians can send a message to the child user to remind them to stop using the electronic device. At this time, the electronic device receiving the message can execute the method provided in this application embodiment, using the same or similar voice as the parent or guardian who sent the message, to verbally announce the message and remind the child user to stop using the electronic device.

在另一种可能的实现方式中,电子设备中可以内置相关提示信息。在儿童模式下,当儿童用户使用电子设备满足相应的提示条件时,电子设备可以自动提取相关提示信息,并使用与父母或该儿童用户某一监护人的相同或相似的声音播报提示信息。In another possible implementation, relevant prompts can be built into the electronic device. In child mode, when a child user uses the electronic device to meet the corresponding prompt conditions, the electronic device can automatically extract the relevant prompts and broadcast them in the same or similar voice as the parent or one of the child user's guardians.

在智能家居场景的另一种示例中,电子设备可以是具有语音播报功能的任一智能家居设备。任一智能家居设备在接收到用户的某一联系人发送的消息时,可以通过执行本申请实施例提供的方法,将消息转换为语音信息,并使用与该联系人相同或相似的声音播报语音信息。示例性地,用户在家中使用智能电视观看视频的过程中,家庭成员发送给该用户的消息可以通过智能电视进行播报,并且播报过程中可以使用与该家庭成员相同或相似的声音。例如,妻子在家看电视,丈夫发送至妻子手机的消息可以在智能家居场景下通过智能电视进行播报,如播报的语音信息可以是“我还有10分钟到家,帮我泡一壶茶。”又或者,在用户使用智能电饭煲做好饭后,用户可以通过语音或按键操作向智能电饭煲发送指令,提示智能电饭煲提醒其他家庭成员到餐厅吃饭。智能电饭煲可以将上述“到餐厅吃饭”的信息发送给各个家庭成员,每个家庭成员使用的电子设备在接收到上述信息后,可以通过执行本申请实施例的方法,将该信息转换为语音消息,并使用与前述做饭的用户相同或相似的声音进行播报,提醒各个家庭成员到餐厅吃饭。在上述示例中,各个家庭成员使用的电子设备的类型可以是不同的。例如,在室外的家庭成员A可以是由手机执行本方法,播报到餐厅吃饭的语音消息;正在客厅看电视的家庭成员B可以是由智能电视向其播报到餐厅吃饭的语音消息;而正在室内某个房间内的家庭成员C则可以是由佩戴的智能手表或智能手环或者房间内的智能音箱来执行本方法,向家庭成员C语音播报到餐厅吃饭的消息。In another example of a smart home scenario, the electronic device can be any smart home device with voice broadcast functionality. When any smart home device receives a message sent by a user's contact, it can convert the message into voice information by executing the method provided in this application embodiment, and broadcast the voice information using the same or similar voice as the contact. For example, while a user is watching a video on a smart TV at home, messages sent to the user by family members can be broadcast through the smart TV, and the same or similar voice as the family member can be used during the broadcast. For instance, if a wife is watching TV at home, a message sent by her husband to her mobile phone can be broadcast through the smart TV in a smart home scenario, such as the voice message "I'll be home in 10 minutes, please make me a pot of tea." Alternatively, after a user has cooked rice using a smart rice cooker, the user can send a command to the smart rice cooker via voice or button operation, prompting the smart rice cooker to remind other family members to come to the dining room for dinner. The smart rice cooker can send the aforementioned "go to the restaurant for dinner" message to each family member. Upon receiving this message, each family member's electronic device can convert it into a voice message using the method described in this application, and broadcast it in the same or similar voice as the user cooking, reminding each family member to go to the restaurant for dinner. In the above example, the types of electronic devices used by each family member can be different. For example, family member A, who is outdoors, can use a mobile phone to execute this method and broadcast the voice message "go to the restaurant for dinner"; family member B, who is watching TV in the living room, can use a smart TV to broadcast the voice message "go to the restaurant for dinner"; and family member C, who is indoors in a room, can use a smartwatch, smart bracelet, or a smart speaker in the room to execute this method and broadcast the message "go to the restaurant for dinner" to family member C via voice.

上述提醒家庭成员到餐厅吃饭的示例中,整个过程也可以通过智能音箱来实现。示例性地,用户做好饭后,可以直接对智能音箱说“叫大家过来吃饭了”,智能音箱可以对上述语音进行处理,并将处理后的信息发送至家庭内其他房间的智能音箱,这些房间内的智能音箱可以使用与该用户相同或相似的声音提醒大家到餐厅吃饭。对于所处环境无法通过其他智能音箱传递信息的,例如前述示例中处于室外的家庭成员A,智能音箱处理后的信息可以被发送至与家庭成员A关联的其他智能电子设备,如手机、智能手表、智能手表等,由关联的这些电子设备执行本方法,再使用与该用户相同或相似的声音提醒家庭成员A到餐厅吃饭。In the example above where a family member is reminded to come to the dining room, the entire process can also be implemented using a smart speaker. For instance, after preparing the meal, the user can directly say to the smart speaker, "Call everyone to come eat." The smart speaker can process this voice and send the processed information to smart speakers in other rooms of the house. These smart speakers can then use the same or a similar voice as the user to remind everyone to come to the dining room. For environments where information cannot be transmitted via other smart speakers, such as family member A who is outdoors in the aforementioned example, the processed information can be sent to other smart electronic devices associated with family member A, such as mobile phones, smartwatches, etc. These associated electronic devices will then execute this method and use the same or a similar voice as the user to remind family member A to come to the dining room.

在智能家居场景的又一种示例中,电子设备可以是智能闹钟。这样,智能闹钟可以在设定的时间,或者根据其他用户的指令,使用与某一联系人相同或相似的声音对用户进行提醒。例如,学生用户可以使用智能闹钟进行叫早服务,智能闹钟在早上6点可以使用其父亲或母亲的声音提醒该学生用户“赶紧起床了”。或者,智能闹钟也可以在其他需要定时处理的场景中提供提醒服务。例如,智能闹钟可以在学生用户进行作业测试的场景中,使用老师的声音提醒该学生“可以开始答题”或者“答题结束”。In another example of a smart home scenario, an electronic device could be a smart alarm clock. This smart alarm clock can remind a user at a set time, or based on instructions from another user, using a voice that is the same as or similar to that of a particular contact. For example, a student could use a smart alarm clock for a wake-up call, where the alarm clock could use their father's or mother's voice to remind them "Get up now!" at 6:00 AM. Alternatively, a smart alarm clock could also provide reminders in other scenarios requiring timed processing. For instance, in a scenario where a student is taking a homework test, a smart alarm clock could use a teacher's voice to remind the student "You can start answering the questions" or "The test is over."

本申请实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应每一个功能划分每一个功能模块,也可以将一个或多个的功能集成在一个功能模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。下面以对应每一个功能划分每一个功能模块为例进行说明。This application embodiment can divide an electronic device into functional modules based on the above method example. For example, each function can be divided into a separate functional module, or one or more functions can be integrated into a single functional module. The integrated module can be implemented in hardware or as a software functional module. It should be noted that the module division in this application embodiment is illustrative and only represents one logical functional division; other division methods may be used in actual implementation. The following description uses the example of dividing each function into a separate functional module.

对应于上述各个实施例,参照图31,示出了本申请实施例提供的一种信息处理装置的结构框图,该装置可以应用于前述各个实施例中的电子设备,该装置具体可以包括如下模块:联系人确定模块3101、语音消息检索模块3102、目标声音生成模块3103、目标信息生成模块3104和语音播报模块3105,其中:Referring to FIG31, a structural block diagram of an information processing device provided in this application embodiment is shown, corresponding to the above embodiments. This device can be applied to the electronic devices in the foregoing embodiments. Specifically, the device may include the following modules: a contact identification module 3101, a voice message retrieval module 3102, a target voice generation module 3103, a target information generation module 3104, and a voice broadcasting module 3105, wherein:

联系人确定模块3101,用于响应于接收的第一信息,确定发送所述第一信息的目标联系人;The contact identification module 3101 is used to identify the target contact who sent the first information in response to the received first information.

语音消息检索模块3102,用于检索所述目标联系人的语音消息;The voice message retrieval module 3102 is used to retrieve voice messages of the target contact.

目标声音生成模块3103,用于基于所述语音消息生成与所述目标联系人的声音相似的目标声音,所述语音消息包括第一页面中的语音消息;The target voice generation module 3103 is used to generate a target voice similar to the voice of the target contact based on the voice message, wherein the voice message includes the voice message in the first page;

目标信息生成模块3104,用于生成与所述第一信息相对应的目标信息;Target information generation module 3104 is used to generate target information corresponding to the first information;

语音播报模块3105,用于采用所述目标声音对所述目标信息进行语音播报。The voice broadcasting module 3105 is used to broadcast the target information using the target voice.

上述装置可以是前述各个实施例中的电子设备,或者,该装置也可以是上述电子设备中能够实现相应功能的单元或组件。The aforementioned device may be an electronic device in the foregoing embodiments, or it may be a unit or component in the aforementioned electronic device that can perform the corresponding function.

需要说明的是,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。It should be noted that all relevant content of each step involved in the above method embodiments can be referenced from the functional description of the corresponding functional module, and will not be repeated here.

本申请实施例还提供一种电子设备,电子设备包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,当处理器执行计算机程序时,可以实现前述各个实施例中的信息处理方法。This application also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, it can implement the information processing methods described in the foregoing embodiments.

本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当该计算机指令在电子设备上运行时,使得电子设备执行上述相关方法步骤可以实现前述各个实施例中的信息处理方法。This application also provides a computer-readable storage medium storing computer instructions. When the computer instructions are executed on an electronic device, the electronic device performs the aforementioned related method steps to implement the information processing methods in the foregoing embodiments.

本申请实施例还提供一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现前述各个实施例中的信息处理方法。This application also provides a computer program product that, when run on a computer, causes the computer to perform the aforementioned related steps to implement the information processing methods in the foregoing embodiments.

本申请实施例还提供一种芯片,该芯片可以为处理器,或者,该芯片包括处理器。处理器可以为通用处理器,也可以为专用处理器;其中,处理器用于支持折叠屏设备执行上述相关步骤,以实现前述各个实施例中的信息处理方法。This application also provides a chip, which can be a processor, or the chip includes a processor. The processor can be a general-purpose processor or a dedicated processor; wherein, the processor is used to support the foldable screen device in performing the above-mentioned related steps to implement the information processing methods in the foregoing embodiments.

可选地,该芯片还包括收发器,收发器用于接受处理器的控制,用于支持电子设备执行上述相关步骤,以实现前述各个实施例中的信息处理方法。Optionally, the chip further includes a transceiver for receiving control from the processor to support the electronic device in performing the aforementioned steps to implement the information processing methods in the foregoing embodiments.

可选地,该芯片还可以包括存储介质。Optionally, the chip may also include a storage medium.

该芯片可以使用下述电路或者器件来实现:一个或多个现场可编程门阵列(field programmable gate array,FPGA)、可编程逻辑器件(programmable logic device,PLD)、控制器、状态机、门逻辑、分立硬件部件、任何其他适合的电路、或者能够执行本申请通篇所描述的各种功能的电路的任意组合。The chip can be implemented using one or more field-programmable gate arrays (FPGAs), programmable logic devices (PLDs), controllers, state machines, gate logic, discrete hardware components, any other suitable circuits, or any combination of circuits capable of performing the various functions described throughout this application.

最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。Finally, it should be noted that the above description is only a specific implementation of this application, but the protection scope of this application is not limited thereto. Any changes or substitutions within the technical scope disclosed in this application should be covered within the protection scope of this application.

Claims (25)

一种信息处理方法,其特征在于,包括:An information processing method, characterized in that it includes: 响应于接收的第一信息,确定发送所述第一信息的目标联系人;In response to the received first information, determine the target contact to whom the first information was sent; 检索所述目标联系人的语音消息,并基于所述语音消息生成与所述目标联系人的声音相似的目标声音,所述语音消息包括第一页面中的语音消息;Retrieve voice messages of the target contact and generate a target voice similar to the voice of the target contact based on the voice messages, wherein the voice messages include voice messages in the first page; 生成与所述第一信息相对应的目标信息;Generate target information corresponding to the first information; 采用所述目标声音对所述目标信息进行语音播报。The target information is broadcast using the target sound. 根据权利要求1所述的方法,其特征在于,所述检索所述目标联系人的语音消息,包括:According to the method of claim 1, the step of retrieving the voice messages of the target contact includes: 显示与所述目标联系人关联的第一页面,所述第一页面包括所述目标联系人的会话页面;Display a first page associated with the target contact, the first page including the target contact's conversation page; 在所述第一页面中检索由所述目标联系人发送的语音消息。Retrieve voice messages sent by the target contact on the first page. 根据权利要求2所述的方法,其特征在于,所述第一页面还包括所述目标联系人的历史会话页面,所述显示与所述目标联系人关联的第一页面,包括:According to the method of claim 2, the first page further includes the target contact's historical conversation page, and displaying the first page associated with the target contact includes: 响应于第一操作,显示所述目标联系人的历史会话页面。In response to the first action, the target contact's history conversation page is displayed. 根据权利要求1所述的方法,其特征在于,所述语音消息还包括第二页面中的语音消息,所述检索所述目标联系人的语音消息,包括:According to the method of claim 1, the voice message further includes voice messages on a second page, and the step of retrieving the voice message of the target contact includes: 响应于第二操作,显示与所述目标联系人关联的第二页面,所述第二页面包括历史会话记录检索页面;In response to the second operation, a second page associated with the target contact is displayed, the second page including a history session retrieval page; 在所述第二页面中检索由所述目标联系人发送的语音消息。Retrieve voice messages sent by the target contact on the second page. 根据权利要求1至4任一项所述的方法,其特征在于,所述基于所述语音消息生成与所述目标联系人的声音相似的目标声音,包括:The method according to any one of claims 1 to 4, characterized in that generating a target voice similar to the voice of the target contact based on the voice message includes: 从所述目标联系人的语音消息中提取所述目标联系人的声音特征;Extract the voice features of the target contact from the voice messages of the target contact; 根据所述声音特征和接收的所述第一信息进行声音数据合成,得到与所述目标联系人的声音相似的目标声音。Based on the sound characteristics and the received first information, sound data is synthesized to obtain a target sound similar to the voice of the target contact person. 根据权利要求5所述的方法,其特征在于,所述从所述目标联系人的语音消息中提取所述目标联系人的声音特征,包括:The method according to claim 5, wherein extracting the voice features of the target contact from the voice message of the target contact includes: 点击播放检索到的所述目标联系人的语音消息;Click to play the voice message from the target contact that was retrieved; 在播放所述目标联系人的语音消息的过程中,抓取音频数据;During the playback of the target contact's voice message, audio data is captured; 从所述音频数据中提取所述目标联系人的声音特征。Extract the voice features of the target contact from the audio data. 根据权利要求1至6任一项所述的方法,其特征在于,所述语音消息还包括所述目标联系人预先录制或在与所述目标联系人通话的过程中录制并存储的语音消息,所述检索所述目标联系人的语音消息,还包括:The method according to any one of claims 1 to 6, characterized in that the voice message further includes voice messages pre-recorded by the target contact or recorded and stored during a call with the target contact, and the retrieval of the voice messages of the target contact further includes: 根据所述目标联系人的联系人信息,在存储有所述目标联系人的语音消息的数据库中检索所述目标联系人的语音消息。Based on the contact information of the target contact, retrieve the voice messages of the target contact from the database storing the voice messages of the target contact. 根据权利要求1至7任一项所述的方法,其特征在于,在检索所述目标联系人的语音消息之后,还包括:The method according to any one of claims 1 to 7, characterized in that, after retrieving the voice messages of the target contact, it further includes: 若检索到的所述语音消息中包含多个联系人的声音,则确定用于声音筛选的因素;其中,所述因素包括如下至少一项:联系人声音的频谱信息、声强信息或持续时间信息;If the retrieved voice message contains the voices of multiple contacts, then factors for voice filtering are determined; wherein, the factors include at least one of the following: spectral information, sound intensity information, or duration information of the contact's voice; 根据所述因素,从所述语音消息中提取属于所述目标联系人的声音。Based on the aforementioned factors, the voice belonging to the target contact is extracted from the voice message. 根据权利要求1至8任一项所述的方法,其特征在于,所述生成与所述第一信息相对应的目标信息,包括:The method according to any one of claims 1 to 8, characterized in that generating target information corresponding to the first information includes: 确定信息来源;Determine the source of the information; 根据所述信息来源和接收到的所述第一信息的内容,生成所述目标信息。The target information is generated based on the information source and the content of the first information received. 根据权利要求9所述的方法,其特征在于,所述确定信息来源,包括:The method according to claim 9, wherein determining the information source includes: 确定所述第一信息与相邻的前一条信息之间的差异;所述差异包括接收时间间隔、所属平台、会话类型、所属群组、所属联系人;Determine the differences between the first piece of information and the adjacent previous piece of information; the differences include the receiving time interval, the platform to which it belongs, the session type, the group to which it belongs, and the contact to which it belongs; 根据所述差异确定待播报的信息来源的具体内容。The specific content of the information source to be broadcast is determined based on the differences mentioned above. 根据权利要求10所述的方法,其特征在于,所述根据所述差异确定待播报的信息来源的具体内容,包括:According to the method of claim 10, the step of determining the specific content of the information source to be broadcast based on the difference includes: 若所述第一信息与相邻的前一条信息之间的接收时间间隔大于预设间隔,则确定待播报的信息来源的内容包括完整的信息来源;If the reception time interval between the first message and the adjacent previous message is greater than a preset interval, then it is determined that the content of the information source to be broadcast includes the complete information source; 若所述第一信息与相邻的前一条信息之间的接收时间间隔小于或等于所述预设间隔,则依次判断所述第一信息与相邻的前一条信息之间的所属平台、会话类型、所属群组、所属联系人的变换情况;根据所述变换情况确定待播报的信息来源的具体内容。If the reception time interval between the first message and the adjacent previous message is less than or equal to the preset interval, then the changes in the platform, session type, group, and contact person between the first message and the adjacent previous message are determined sequentially; the specific content of the information source to be broadcast is determined based on the changes. 根据权利要求11所述的方法,其特征在于,所述根据所述变换情况确定待播报的信息来源的具体内容,包括:According to the method of claim 11, the step of determining the specific content of the information source to be broadcast based on the transformation includes: 若所述第一信息与相邻的前一条信息之间的所属平台、会话类型、所属群组、所属联系人中任一项发生变化,则确定待播报的信息来源的内容包括发生变化的相应内容。If any of the following changes between the first message and the preceding message: platform, session type, group, or contact, then the content of the message to be broadcast will include the changed content. 根据权利要求10至12任一项所述的方法,其特征在于,所述根据所述差异确定待播报的信息来源的具体内容,还包括:The method according to any one of claims 10 to 12, characterized in that, determining the specific content of the information source to be broadcast based on the difference further includes: 确定当前用户与所述目标联系人的熟悉程度;Determine the level of familiarity between the current user and the target contact; 根据所述熟悉程度确定待播报的信息来源中包括的所述目标联系人的名称的具体内容。The specific details of the name of the target contact person included in the information source to be broadcast are determined based on the level of familiarity. 根据权利要求13所述的方法,其特征在于,所述确定用户与所述目标联系人的熟悉程度,包括:The method according to claim 13, wherein determining the familiarity between the user and the target contact includes: 获取所述当前用户与所述目标联系人的互动行为特征;Obtain the interaction behavior characteristics between the current user and the target contact; 根据所述互动行为特征,确定所述用户与所述目标联系人的熟悉程度。Based on the characteristics of the interactive behavior, the degree of familiarity between the user and the target contact is determined. 根据权利要求13或14所述的方法,其特征在于,所述根据所述熟悉程度确定待播报的信息来源中包括的所述目标联系人的名称的具体内容,包括:The method according to claim 13 or 14, characterized in that, determining the specific content of the name of the target contact person included in the information source to be broadcast based on the degree of familiarity includes: 若基于所述熟悉程度确定所述目标联系人为所述当前用户的熟悉联系人,则确定待播报的信息来源中可省略所述目标联系人的名称;If the target contact is determined to be a familiar contact of the current user based on the level of familiarity, then the name of the target contact can be omitted from the source of the information to be broadcast. 若基于所述熟悉程度确定所述目标联系人为所述当前用户的非熟悉联系人,则对所述目标联系人的名称进行简化,确定待播报的信息来源中包括的所述目标联系人的名称为简化后的所述目标联系人的名称。If, based on the level of familiarity, the target contact is determined to be an unfamiliar contact of the current user, then the name of the target contact is simplified, and the name of the target contact included in the information source to be broadcast is determined to be the simplified name of the target contact. 根据权利要求9至15任一项所述的方法,其特征在于,所述根据所述信息来源和接收到的所述第一信息的内容,生成所述目标信息,包括:The method according to any one of claims 9 to 15, characterized in that, generating the target information based on the information source and the content of the received first information includes: 预估语音播报所述第一信息的时长;Estimate the duration of the first information to be read aloud; 若所述时长超过预设值,对所述第一信息进行精简;If the duration exceeds a preset value, the first information is simplified. 根据所述信息来源和精简后的所述第一信息的内容,生成所述目标信息。The target information is generated based on the information source and the content of the simplified first information. 根据权利要求9至16任一项所述的方法,其特征在于,所述第一信息还包括非文本信息,所述根据所述信息来源和接收到的所述第一信息的内容,生成所述目标信息,还包括:The method according to any one of claims 9 to 16, characterized in that the first information further includes non-textual information, and the step of generating the target information based on the information source and the content of the received first information further includes: 确定所述非文本信息的信息类型;Determine the information type of the non-text information; 根据所述信息类型对所述非文本信息进行文本转换;The non-text information is converted into text based on the information type. 根据所述信息来源和文本转换后的所述第一信息,生成所述目标信息。The target information is generated based on the information source and the first information after text conversion. 根据权利要求17所述的方法,其特征在于,所述非文本信息包括链接信息;所述根据所述信息类型对所述非文本信息进行文本转换,包括:According to the method of claim 17, the non-text information includes link information; the step of converting the non-text information into text according to the information type includes: 确定所述链接信息对应的链接内容,并对所述链接内容进行概括得到文本形式的概括文本。The link content corresponding to the link information is determined, and the link content is summarized to obtain a summary text in text form. 根据权利要求17或18所述的方法,其特征在于,在根据所述信息类型对所述非文本信息进行文本转换之后,还包括:The method according to claim 17 or 18, characterized in that, after converting the non-text information into text according to the information type, it further includes: 确定接收到所述非文本信息的会话类型;Determine the session type that received the non-text information; 根据所述会话类型为文本转换后的所述第一信息添加过渡语。Add transitional phrases to the first information after text conversion based on the session type. 根据权利要求1至19任一项所述的方法,其特征在于,所述第一信息包括在预设时间段内发送的多条信息,所述生成与所述第一信息相对应的目标信息,还包括:The method according to any one of claims 1 to 19, characterized in that the first information includes multiple messages sent within a preset time period, and the generation of target information corresponding to the first information further includes: 对在预设时间段内发送的多条信息进行合并;Merge multiple messages sent within a preset time period; 生成与合并后的所述多条信息相对应的所述目标信息。Generate the target information corresponding to the merged multiple pieces of information. 根据权利要求20所述的方法,其特征在于,所述对在预设时间段内发送的多条信息进行合并,包括:According to the method of claim 20, the step of merging multiple messages sent within a preset time period includes: 确定所述目标联系人在预设时间段内发送的多条信息的会话类型;Determine the conversation type of multiple messages sent by the target contact within a preset time period; 分别将所述目标联系人在预设时间段内发送且属于相同会话类型的多条信息进行合并。Multiple messages sent by the target contact within a preset time period and belonging to the same session type are merged. 根据权利要求20所述的方法,其特征在于,所述第一信息包括多个关联联系人在预设时间段内发送的多条群组会话信息,所述对在预设时间段内发送的多条信息进行合并,包括:According to the method of claim 20, the first information includes multiple group conversation messages sent by multiple associated contacts within a preset time period, and the merging of the multiple messages sent within the preset time period includes: 分别确定多个所述关联联系人在预设时间段内发送的多条信息的内容;The content of multiple messages sent by the associated contacts within a preset time period is determined respectively; 将多个所述关联联系人在预设时间段内发送的且内容相似的多条信息进行合并。Multiple messages with similar content sent by multiple associated contacts within a preset time period are merged. 根据权利要求1至22任一项所述的方法,其特征在于,所述第一信息还包括非即时信息,所述方法还包括:The method according to any one of claims 1 to 22, characterized in that the first information further includes non-real-time information, and the method further includes: 响应于用户指令,检索所述用户指令对应的一条或多条非即时信息;In response to a user instruction, retrieve one or more non-real-time messages corresponding to the user instruction; 生成与一条或多条所述非即时信息相对应的目标信息,并对所述目标信息进行语音播报。Generate target information corresponding to one or more of the aforementioned non-real-time information, and broadcast the target information via voice. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至23任一项所述的信息处理方法。An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, implements the information processing method as described in any one of claims 1 to 23. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至23任一项所述的信息处理方法。A computer program product, characterized in that, when the computer program product is run on a computer, it causes the computer to perform the information processing method as described in any one of claims 1 to 23.
PCT/CN2025/106282 2024-07-10 2025-06-30 Information processing method and apparatus, and electronic device Pending WO2026012240A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202410926760.9 2024-07-10
CN202410926760 2024-07-10
CN202411562369.1A CN121334292A (en) 2024-07-10 2024-11-01 Information processing method and device and electronic equipment
CN202411562369.1 2024-11-01

Publications (1)

Publication Number Publication Date
WO2026012240A1 true WO2026012240A1 (en) 2026-01-15

Family

ID=98350286

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2025/106282 Pending WO2026012240A1 (en) 2024-07-10 2025-06-30 Information processing method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN121334292A (en)
WO (1) WO2026012240A1 (en)

Also Published As

Publication number Publication date
CN121334292A (en) 2026-01-13

Similar Documents

Publication Publication Date Title
JP6640384B2 (en) Incorporating selectable application links into conversation threads
KR102100742B1 (en) Remote extension of digital assistant services
US9576569B2 (en) Playback control apparatus, playback control method, and medium for playing a program including segments generated using speech synthesis
JP6505117B2 (en) Interaction of digital personal digital assistant by replication and rich multimedia at response
US20200106726A1 (en) Suggested responses based on message stickers
CN102017585B (en) Method and system for notification and telecommunications management
EP3611724A1 (en) Voice response method and device, and smart device
CN115668957B (en) Audio detection and subtitle rendering
US20170277993A1 (en) Virtual assistant escalation
JP2021012384A (en) Automated assistant having conferencing ability
CN114765597A (en) Automatic message reply
KR20200039030A (en) Far-field extension for digital assistant services
CN109309751B (en) Voice recording method, electronic device and storage medium
CN111565143B (en) Instant messaging method, equipment and computer readable storage medium
KR20150038375A (en) Voice-based media searching
CN111666059B (en) Reminder information broadcasting method, device, and electronic device
JP2018513511A (en) Message transmission method, message processing method, and terminal
WO2014154097A1 (en) Automatic page content reading-aloud method and device thereof
CN109460265A (en) A kind of method, user terminal and server activating application program
WO2016203805A1 (en) Information processing device, information processing system, information processing method, and program
US10965629B1 (en) Method for generating imitated mobile messages on a chat writer server
CN110677377B (en) Recording processing and playing method and device, server, terminal and storage medium
CN111935348A (en) Method and device for providing call processing service
CN111158838B (en) Information processing method and device
WO2026012240A1 (en) Information processing method and apparatus, and electronic device