[go: up one dir, main page]

CN119296534A - Interaction method, intelligent terminal and storage medium - Google Patents

Interaction method, intelligent terminal and storage medium Download PDF

Info

Publication number
CN119296534A
CN119296534A CN202411397740.3A CN202411397740A CN119296534A CN 119296534 A CN119296534 A CN 119296534A CN 202411397740 A CN202411397740 A CN 202411397740A CN 119296534 A CN119296534 A CN 119296534A
Authority
CN
China
Prior art keywords
voice
user
text content
response
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411397740.3A
Other languages
Chinese (zh)
Inventor
陈薪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Transsion Holdings Co Ltd
Original Assignee
Shenzhen Transsion Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Transsion Holdings Co Ltd filed Critical Shenzhen Transsion Holdings Co Ltd
Priority to CN202411397740.3A priority Critical patent/CN119296534A/en
Publication of CN119296534A publication Critical patent/CN119296534A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

本申请提出了一种交互方法、智能终端及存储介质。该交互方法包括:S11,响应于预设事件,确定语音助手的目标角色和/或应答语音。基于此,本申请可以通过预设事件触发使得语音助手在交互过程中自动切换角色,例如随着用户或者语音的文本内容的不同,语音助手会切换为适应的目标角色,从而改善语音交互的个性化互动,提升用户使用体验。

The present application proposes an interaction method, an intelligent terminal and a storage medium. The interaction method includes: S11, in response to a preset event, determining the target role and/or response voice of the voice assistant. Based on this, the present application can trigger the voice assistant to automatically switch roles during the interaction process through preset events. For example, as the user or the text content of the voice is different, the voice assistant will switch to an adapted target role, thereby improving the personalized interaction of the voice interaction and enhancing the user experience.

Description

Interaction method, intelligent terminal and storage medium
Technical Field
The application relates to the field of voice recognition, in particular to an interaction method, an intelligent terminal and a storage medium.
Background
When a voice assistant interacts with a user, the same intelligent terminal or the same voice assistant performing the interaction usually has only one role, and the interaction mode of the single role is easy to be monotonous.
In the process of conception and implementation of the application, the inventor finds that at least the following problems exist that in the interaction process, the user has a requirement for switching the roles of the voice assistant, for example, different users have the interaction requirement of corresponding roles, and for example, even the same user is in different dialogue scenes, the voice assistant can be very expected to automatically switch to different roles according to the requirement. However, the existing voice assistant does not perform role switching according to different text contents of the user or voice, so that the user experience is poor. When the user expects the voice assistant to switch the roles, the method mainly comprises the following steps that firstly, the user definitely sends out an instruction for switching the roles to control the voice assistant to switch to the roles selected by the user, and secondly, the characteristic information of the user is obtained to match the roles of the voice assistant corresponding to the characteristic information. However, the two modes only determine the role to be switched before interaction and enable the voice assistant to switch to the role, so that the automatic switching of the role can not be performed in the interaction process, personalized interaction of voice interaction is limited, and the use experience of a user is affected.
The foregoing description is provided for general background information and does not necessarily constitute prior art.
Disclosure of Invention
Aiming at the technical problems, the application provides an interaction method, an intelligent terminal and a storage medium, which can enable a voice assistant to automatically switch roles in the interaction process, improve personalized interaction of voice interaction and promote user experience.
The application provides an interaction method which is applied to an intelligent terminal and comprises the following steps:
And S11, responding to a preset event, and determining a target role of the voice assistant and/or answering voice.
Optionally, the preset event includes at least one of the following:
pausing the text content of the user voice;
the first user is switched to the second user to send out user voice;
a voice call event;
And the target scene and/or the target object has a preset association relation with the user voice.
Optionally, the preset association relationship includes at least one of the following:
has a text logic relationship with the text content of the user's voice;
A preset mapping relation between text content of user voice and a target scene;
a preset mapping relation between text content of user voice and a target object;
a preset mapping relation between a user and a target scene;
a preset mapping relation between a user and a target object.
Optionally, the determining the target role of the voice assistant includes at least one of:
determining a role with highest matching degree as a target role according to text content of user voice;
determining a target role of the voice assistant according to the user instruction;
And determining the target role of the voice assistant according to the current scene information and/or the historical interaction information.
Optionally, the determining the answer speech includes:
Determining feature information of response voice according to at least one of the working mode of the intelligent terminal, the user role, the historical interaction information and preset features of user voice;
Determining the text content of the response;
and generating response voice according to the characteristic information of the response voice and the text content.
Optionally, the determining the text content of the response includes:
acquiring an industry spoken language assisting word of the target role through a preset AI model;
And generating text content of the response according to the industry spoken language auxiliary words.
Optionally, the text content for generating the response according to the industry spoken language assisting word includes at least one of the following:
inserting the industry spoken language auxiliary words at the pause points of the text contents of the response;
learning sentence intervals when the text content of the response is converted into voice through a preset AI model, and inserting the industrial spoken language auxiliary words into the sentence intervals exceeding a preset threshold value.
Optionally, the method further comprises:
and labeling the user according to the text content of the user voice and/or the historical interaction information of the user and the intelligent terminal.
Optionally, the method further comprises:
and carrying out backup storage on the output result of the target role and/or the response voice and the user voice.
Optionally, the method further comprises:
A schedule is created in response to the output result of the target character and/or the response voice and the user voice recognition schedule information.
The application also provides an intelligent terminal which comprises a memory and a processor, wherein the memory stores a message display program, and the message display program realizes the steps of any interaction method when being executed by the processor.
The application also provides a storage medium storing a computer program which, when executed by a processor, implements the steps of any of the interaction methods described above.
As described above, the technical scheme of the application comprises the steps of responding to the preset event, determining the target role of the voice assistant and/or responding to the voice. According to the application, the voice assistant can automatically switch roles in the interaction process through the triggering of the preset event, for example, the voice assistant can switch to the adaptive target role along with different text contents of the user or voice, so that the personalized interaction of voice interaction is improved, and the use experience of the user is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic diagram of a hardware structure of a mobile terminal implementing various embodiments of the present application;
Fig. 2 is a schematic diagram of a communication network system according to an embodiment of the present application;
FIG. 3 is a flow chart of an interaction method according to a first embodiment of the present application;
FIGS. 4 (a) and 4 (b) are schematic diagrams showing the voice assistant of two APPs of different intelligent terminals on the operation interface of one APP;
fig. 5a is an interaction schematic diagram based on a called event according to a first embodiment of the present application;
Fig. 5b is a schematic diagram of another interaction based on a called event according to the first embodiment of the present application;
FIG. 6 is a flowchart of determining a response voice according to a first embodiment of the present application;
FIG. 7 is a flow chart of an interaction method according to a second embodiment of the present application;
FIG. 8 is a flow chart of another interaction method according to a second embodiment of the present application;
fig. 9 (a) and 9 (b) are schematic views showing two voice assistants of different smart terminals on the interface of one of the smart terminals.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments. Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the element(s) defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other like elements in different embodiments of the application having the same meaning as may be defined by the same meaning as they are explained in this particular embodiment or by further reference to the context of this particular embodiment.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context. Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, steps, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, steps, operations, elements, components, items, categories, and/or groups. The terms "or", "and/or", "including at least one of", and the like, as used herein, may be construed as inclusive, or mean any one or any combination. For example, "including at least one of" A, B, C "means" any of A, B, C, A and B, A and C, B and C, A and B and C ", and as yet another example," A, B or C "or" A, B and/or C "means" any of A, B, C, A and B, A and C, B and C, A and B and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are in some way inherently mutually exclusive.
It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily occurring in sequence, but may be performed alternately or alternately with other steps or at least a portion of the other steps or stages.
The words "if", as used herein, may be interpreted as "at" or "when" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
It should be noted that, in this document, step numbers such as S22 and S23 are adopted, and the purpose of the present application is to more clearly and briefly describe the corresponding content, and not to constitute a substantial limitation on the sequence, and those skilled in the art may execute S23 first and then execute S22 when implementing the present application, which is within the scope of protection of the present application.
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the following description, suffixes such as "module", "part" or "unit" for representing elements are used only for facilitating the description of the present application, and have no specific meaning per se. Thus, "module," "component," or "unit" may be used in combination.
The intelligent terminal may be implemented in various forms. For example, the smart terminals described in the present application may include smart terminals such as mobile phones, tablet computers, notebook computers, palm computers, personal digital assistants (Personal DIGITAL ASSISTANT, PDA), portable media players (Portable MEDIA PLAYER, PMP), navigation devices, wearable devices, smart bracelets, pedometers, and fixed terminals such as digital TVs, desktop computers, and the like.
The following description will be given taking a mobile terminal as an example, and those skilled in the art will understand that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for a moving purpose.
Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present application, the mobile terminal 100 may include an RF (Radio Frequency) unit 101, a WiFi module 102, an audio output unit 103, an a/V (audio/video) input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, a processor 110, and a power supply 111. Those skilled in the art will appreciate that the mobile terminal structure shown in fig. 1 is not limiting of the mobile terminal and that the mobile terminal may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The following describes the components of the mobile terminal in detail with reference to fig. 1:
The radio frequency unit 101 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, specifically, receiving downlink information of a base station, processing the downlink information by the processor 110, and transmitting uplink data to the base station. Typically, the radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, GSM (Global System of Mobile communication, global System for Mobile communications), GPRS (GENERAL PACKET Radio Service), CDMA2000 (Code Division Multiple Access, code Division multiple Access 2000), WCDMA (Wideband Code Division Multiple Access ), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, time Division synchronous code Division multiple Access), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency Division Duplex Long term evolution), TDD-LTE (Time Division Duplexing-Long Term Evolution, time Division Duplex Long term evolution), 5G, 6G, and the like.
WiFi belongs to a short-distance wireless transmission technology, and a mobile terminal can help a user to send and receive e-mails, browse web pages, access streaming media and the like through the WiFi module 102, so that wireless broadband Internet access is provided for the user. Although fig. 1 shows a WiFi module 102, it is understood that it does not belong to the necessary constitution of a mobile terminal, and can be omitted entirely as required within a range that does not change the essence of the invention.
The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a talk mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output (e.g., a call signal reception sound, a message reception sound, etc.) related to a specific function performed by the mobile terminal 100. The audio output unit 103 may include a speaker, a buzzer, and the like.
The a/V input unit 104 is used to receive an audio or video signal. The a/V input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042, the graphics processor 1041 processing image data of still pictures or video obtained by an image capturing device (e.g. a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphics processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 can receive sound (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, and the like, and can process such sound into audio data. The processed audio (voice) data may be converted into a format output that can be transmitted to the mobile communication base station via the radio frequency unit 101 in the case of a telephone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting the audio signal.
The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor and a proximity sensor, optionally, the ambient light sensor may adjust the brightness of the display panel 1061 according to the brightness of ambient light, and the proximity sensor may turn off the display panel 1061 and/or the backlight when the mobile terminal 100 moves to the ear. The accelerometer sensor can detect the acceleration in all directions (generally three axes), can detect the gravity and the direction when the accelerometer sensor is static, can be used for identifying the gesture of a mobile phone (such as transverse and vertical screen switching, related games, magnetometer gesture calibration), vibration identification related functions (such as pedometer and knocking), and the like, and can be configured as other sensors such as fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors and the like, which are not repeated herein.
The display unit 106 is used to display information input by a user or information provided to the user. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), an Organic Light-Emitting Diode (OLED), or the like.
The user input unit 107 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the mobile terminal. Alternatively, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1071 or thereabout by using any suitable object or accessory such as a finger, a stylus, etc.) and drive the corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Optionally, the touch detection device detects the touch orientation of the user, detects a signal caused by the touch operation, and transmits the signal to the touch controller, and the touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, and then transmits the touch point coordinates to the processor 110, and can receive and execute a command sent by the processor 110. Further, the touch panel 1071 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 107 may include other input devices 1072 in addition to the touch panel 1071. Alternatively, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc., as specifically not limited herein.
Alternatively, the touch panel 1071 may overlay the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or thereabout, the touch panel 1071 is transferred to the processor 110 to determine the type of touch event, and the processor 110 then provides a corresponding visual output on the display panel 1061 according to the type of touch event. Although in fig. 1, the touch panel 1071 and the display panel 1061 are two independent components for implementing the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 may be integrated with the display panel 1061 to implement the input and output functions of the mobile terminal, which is not limited herein.
The interface unit 108 serves as an interface through which at least one external device can be connected with the mobile terminal 100. For example, the external devices may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and an external device.
Memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a storage program area that may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area that may store data created according to the use of the cellular phone (such as audio data, a phonebook, etc.), etc. In addition, memory 109 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by running or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. The processor 110 may include one or more processing units, and preferably the processor 110 may integrate an application processor and a modem processor, the application processor optionally processing primarily an operating system, user interface and application programs, etc., the modem processor processing primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.
The mobile terminal 100 may further include a power source 111 (e.g., a battery) for supplying power to the respective components, and preferably, the power source 111 may be logically connected to the processor 110 through a power management system, so as to perform functions of managing charging, discharging, and power consumption management through the power management system.
Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described herein.
In order to facilitate understanding of the embodiments of the present application, a communication network system on which the mobile terminal of the present application is based will be described below.
Referring to fig. 2, fig. 2 is a schematic diagram of a communication network system according to an embodiment of the present application, where the communication network system is an LTE system of a general mobile communication technology, and the LTE system includes a UE (User Equipment) 201, an e-UTRAN (Evolved UMTS Terrestrial Radio Access Network ) 202, an epc (Evolved Packet Core, evolved packet core) 203, and an IP service 204 of an operator that are sequentially connected in communication.
Alternatively, the UE201 may be the terminal 100 described above, which is not described here again.
The E-UTRAN202 includes eNodeB2021 and other eNodeB2022, etc. Alternatively, the eNodeB2021 may connect with other enodebs 2022 over a backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide access for the UE201 to the EPC 203.
EPC203 may include MME (Mobility MANAGEMENT ENTITY ) 2031, hss (Home Subscriber Server, home subscriber server) 2032, other MMEs 2033, SGW (SERVING GATE WAY ) 2034, pgw (PDN GATE WAY, packet data network gateway) 2035, PCRF (Policy AND CHARGING Rules Function) 2036, and so on. Optionally, MME2031 is a control node that handles signaling between UE201 and EPC203, providing bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location registers (not shown) and to hold user specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034 and PGW2035 may provide IP address allocation and other functions for UE201, PCRF2036 is a policy and charging control policy decision point for traffic data flows and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).
IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem ), or other IP services, etc.
Although the LTE system is described above as an example, it should be understood by those skilled in the art that the present application is not limited to LTE systems, but may be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, 5G, and future new network systems (e.g., 6G), etc.
Based on the above-mentioned mobile terminal hardware structure and communication network system, various embodiments of the present application are presented.
First embodiment
Fig. 3 is a schematic diagram of an interaction method according to a first embodiment of the present application. The execution main body of the interaction method can be at least one of an intelligent terminal such as a mobile phone, a wearable device, a voice assistant with a voice interaction function, a processor, a storage medium and the like. The voice assistant comprises, but is not limited to, voice interaction programs of various intelligent terminals based on an operating system, such as Siri of an apple type intelligent terminal, love voice of an android type intelligent terminal, love voice of a hong Mongolian type intelligent terminal, zebra voice of domestic various car machines and the like, and voice interaction programs of various APP installed on the intelligent terminal.
Optionally, the voice assistant of each APP is only suitable for the interaction between the user and the APP, or performs the linkage of voice interaction with the voice assistants of other APPs according to the user setting, for example, the user sets that the voice assistants of the first APP and the second APP have a linkage relationship, so that the user can directly issue a voice instruction for controlling the second APP to the voice assistant of the second APP (called as "second voice assistant") through the voice assistant of the first APP (called as "first voice assistant"), as shown in fig. 4 (a) and fig. 4 (b), the first APP can directly display an icon of the "second voice assistant", the user can receive the voice instruction issued by the first APP through the "second voice assistant", and the icon of the "first voice assistant" can be displayed simultaneously when the icon of the "second voice assistant" is displayed, or not displayed, and/or the "first voice assistant" continues to monitor and recognize the voice instruction, or does not monitor and recognize. For example, when the first APP and the second APP are respectively installed on two intelligent terminals, a user may only issue a voice command through one of the intelligent terminals, as shown in fig. 4 (b), icons of the "first voice assistant" and the "second voice assistant" may be simultaneously displayed on the operation interface of the first APP, where the voice command includes contents that need to be executed by the first APP and also includes contents that need to be executed by the second APP, and then the first APP of one of the intelligent terminals performs a corresponding operation, and the second APP of the other intelligent terminal performs a corresponding operation.
The user can set a linkage relation between the two intelligent terminals by setting a preset relation between the two intelligent terminals, wherein the preset relation comprises at least one of a connection relation established based on near field communication technologies such as Bluetooth, a connection relation established based on far field communication technologies such as a cellular network, a connection relation established based on authentication, a connection relation which is positioned in the same local area network, a connection group of the same Internet of things, master-slave control or information receiving-transmitting relation which is formed according to a user-defined rule.
The voice command may be embodied as a sound uttered by a user speaking, or as a sound in the form of an analog signal uttered by a tool such as an electronic device, for example, a synthetic sound.
For ease of illustration and understanding, the present application is described below in terms of a scenario in which a user interacts with an intelligent terminal based on a voice assistant onboard the operating system, unless specifically indicated otherwise.
As shown in fig. 3, the interaction method includes step S11.
And S11, responding to a preset event, and determining a target role of the voice assistant and/or answering voice.
In an example, the preset event may be represented as one or more events, may be a dedicated event for implementing the present application, and is different from an existing event for triggering the intelligent terminal, so that at least any event may be prevented from colliding with an existing event for performing other functions (including a personalized event set by a user). For example, step S11 may be performed only when the preset mode is entered and/or the preset function option is turned on, so as to avoid a conflict with the existing event of performing other functions. For example, only when the preset mode is started by the setting interface, the preset event is considered to be valid, and the corresponding step is triggered to be executed. In this case, the user may execute different functions by implementing the same event more conveniently and rapidly, that is, multiplexing the event for executing other functions, for example, when entering the preset mode, the user may trigger to execute step S11 when the user calls (i.e., the preset event), and when not entering the preset mode, the user may execute the operation only in the conventional manner, for example, the user may hang up automatically after having answered for a long time, and the function of the preset event may not be implemented.
For example, for a scenario suitable for an intelligent terminal, this example is equivalent to adding a message quick look function to the intelligent terminal. The implementation mode of the function includes, but is not limited to, taking a smart phone as an example, pre-programming a script or an application program and installing the script or the application program in an operating system of the smart phone, so that a message quick view option is added in a setting interface of the operating system of the smart phone and/or a setting interface of a voice assistant, and then the function can be correspondingly started or stopped by starting or stopping the option through a sliding slider. Or the intelligent terminal can independently set an APP on the main interface, and the APP can execute the opening and closing of the corresponding control function and the detailed setting of any one or more events including the preset event. After the function option is started, the intelligent terminal can allow the user to set the event on the display interface, so that whether the detected event is consistent with the set event in the subsequent process or not is judged. If the two types of the data are consistent, the execution step S11 is triggered, and if the two types of the data are inconsistent, the execution step S11 is not triggered.
In another example, the preset event is not an existing event that performs other functions, but is merely a dedicated event of the present application. Optionally, the present application combines a preset event with the current interface to determine whether to trigger execution of step S11. For example, if the current interface is an already opened interface (simply referred to as "operation interface") of the voice assistant, the execution step S11 is triggered once the preset event is detected as an existing event for executing other functions, if the current interface is not an operation interface of the voice assistant, the execution step S11 is not triggered even if the preset event is detected as an existing event for executing other functions, but other functions corresponding to the existing event are executed, and if the preset event is detected as a dedicated event of the present application, the execution step S11 is directly triggered regardless of whether the current interface is an operation interface of the voice assistant.
The specific form of any event, including preset events, is not limiting. The following describes, as an example, that the preset event includes at least one of the events 1 to 4.
Event 1-the text content of the user's voice is quiesced.
The stop point can be understood as a sentence breaking identification of a sentence, and the application can identify the stop point through natural language processing (Natural Language Processing, NLP) technology. In an actual scenario, each time a stop is detected, it may indicate that it is recognized that the user has spoken a sentence, it may trigger the determination of the target role of the voice assistant and/or the response of the voice.
And 2, switching from the first user to the second user to send out user voice.
The application can judge whether the speaking user changes by identifying the voiceprint characteristic information of the current voice, and if the voice of the user changes, the target role of the voice assistant and/or the response voice are triggered and determined.
Event 3 voice call event.
The voice call event includes, but is not limited to, at least one of an incoming call event, a called event of a talk class or chat class APP. For example, as shown in fig. 5a and 5b, when a mobile phone calls, if the user does not answer the call within a preset time period, the intelligent terminal automatically activates and operates the voice assistant, and triggers to determine the target role of the voice assistant and/or answer the voice, optionally, after answering, the user voice (i.e. the speaking of the calling party) and the text content of the answer voice are displayed on the interface (i.e. the call interface) of the intelligent terminal shown in fig. 5a and 5b according to the conversation time sequence, if the user answers the call actively within the preset time period, the user can call in a traditional manner, optionally, during the call, the voice assistant can also operate and monitor the passing process in real time, and intervene according to the user indication, for example, the voice assistant inquires other information such as schedule and broadcasts.
The method can operate in an offline mode, can realize functions of answering instead of answering, automatic replying and the like through the voice assistant based on the offline large model, and remarkably improves the conversation efficiency, meanwhile, the offline processing can ensure the privacy safety of users, avoid privacy problems possibly related in real-time conversation, provide better practice for the application of the voice assistant of the offline large model on the end side, and ensure the stability and the expandability of a terminal system.
And 4, target scenes and/or target objects with preset association relation with the user voice.
The interaction of the voice assistant can be associated with different scenes through the target scene, and the interaction of the voice assistant can be associated with different users through the target object, so that the accuracy of the voice interaction is improved.
For example, the text content of the user voice is "xxx buying house now ready for decoration" a few days before, although two words of "decoration" appear, the voice assistant has difficulty in determining whether the user is about to ask for a decoration class problem, at this time, a target scene is introduced through the event 4, for example, the intelligent terminal can automatically start the 3D radar scanning function of the rear camera to scan for the current scene, if the current position in the house is detected, which indicates that the user is about to ask for a decoration class problem, the voice assistant is triggered to determine the corresponding target role, namely, a decoration advisor.
For another example, when a segment of sound (the segment of sound is "user voice") is detected, voiceprint feature information of the segment of sound may be detected to determine whether the segment of sound is emitted by a target user (i.e., "target object"), if so, a voice assistant is triggered to determine a target character and/or answer voice, and if not, the segment of sound is ignored, for example, if it is identified that the sound is emitted by a television or by a user a instead of the target user B. In this way, the accuracy and privacy security of voice interaction can be improved.
Optionally, the preset association relationship includes at least one of the following:
Relationship 1-has a textual logical relationship with the textual content of the user's voice.
For example, the text content of the user's voice is "how planning is needed to decorate a house," decoration "two words have a text logical relationship with the house scene, the intelligent terminal can automatically start the 3D radar scanning function of the rear camera to scan to obtain a current scene, and if the current position in a room is detected, the voice assistant determines a corresponding target role, namely a decoration consultant, if the current position is detected, and the voice assistant indicates that the user wants to inquire about a decoration class.
Unlike the relationship 1, which is automatically determined based on the natural speech processing technique, the following relationships 2 to 5 may be manually preset relationships.
And 2, a preset mapping relation between the text content of the user voice and the target scene.
Relation 3, preset mapping relation between text content of user voice and target object.
For example, the text content of the user voice is "tuesday goes home to eat" and the voice assistant detects that the user voice is "father" and the father has a preset mapping relationship with the furniture, and the voice assistant determines the target role and/or the response voice.
And 4, a preset mapping relation between the user and the scene and/or between the user and the object.
And 5, a preset mapping relation between the user and the target object.
For example, when a user voice is detected, the user who sends the user voice is a target object (i.e. accords with a preset mapping relation between the user and the target object), and/or if the current scene is in a house where the user who sends the user voice lives (i.e. accords with a preset mapping relation between the user and the scene) through the 3D radar scanning of the rear camera, a voice assistant is automatically triggered and operated, and a target role and/or a response voice are determined.
It should be noted that, for a certain technical feature, the embodiments of the present application include multiple cases and multiple possible implementations, where no special description exists, all means that the corresponding technical feature may be implemented by a combination of any part of the modes. For example, the above relation 1 may be combined with any one of the relations 2 to 5. Through the combination scheme, the corresponding technical characteristics can be implemented more accurately and/or intelligently, and further the accuracy and the user experience of technical characteristic implementation are improved.
It should be appreciated that the preset event need not necessarily contain user speech, i.e., the target character of the voice assistant and/or the responsive speech, and need not necessarily be determined based on the detected user speech. For example, in the case that the preset event is the called event, once the called event is detected, the voice assistant can determine the target role and/or answer the voice according to the calling party of the called event, taking the incoming call of the takeaway as an example, the voice assistant can automatically answer the incoming call, determine the target role as the intelligent terminal user according to the historical interaction information (namely, the previously recognized call sound of the intelligent terminal user, the identification of the incoming call number and the like), and certainly can also be a common intelligent robot, note that the takeaway does not need to send any user voice indicating identity at this time, the voice assistant can automatically answer "your well, please put to the first floor express cabinet", and take the father incoming call as an example, when the preset duration is exceeded and the incoming call is not yet answered, the voice assistant can automatically answer the incoming call, determine the target role as the intelligent terminal user according to the historical interaction information (namely, the previously recognized call sound of the intelligent terminal user), and carry out individuation according to the call habit, the tone and the communication language habit of the calling party (namely, father). Such personalized replies may refer to more sophisticated dialog techniques in the field of voice interaction, and will not be described in detail here.
Each character of the voice assistant has expertise and knowledge corresponding to the character, and can provide operations and conversations matching the character. The voice assistant may be an AI (ARTIFICIAL INTELLIGENCE ) assistant, and the roles it contains may contain existing professional roles. Such as traditional decoration consultants, decoration designers, intellectual property technicians, etc., and may also include custom-added characters, such as the present intelligent terminal user synthesized by a voice assistant imitating learning, loving specialists, fairy tales, etc. In step S11, the manner of determining the target role of the voice assistant may include at least one of:
mode 1, determining the character with highest matching degree as a target character according to the text content of the user voice.
The user voice is analyzed through natural language processing technology to obtain text content, the text content is mainly used for determining the intention of the user, and therefore keywords in the text content are extracted to determine corresponding roles according to the keywords and serve as candidate roles.
When the determined candidate character is only one, the candidate character is directly taken as a target character.
When the determined candidate characters are multiple, for example, the text content is that a house needing decoration is a three-room hall, and the user likes to watch movies and play games at home at ordinary times, and according to the keywords of decoration, movie watching and play games, the candidate characters are determined to be three, namely, decoration consultants, movie universities and electronic contestants, respectively, then the highest matching degree of the candidate characters can be determined according to preset rules. The preset rule may be the association degree of the context of the user voice, for example, if the previous sentence of the user and the voice assistant contains "decoration", the "watching movie" and "playing game" are determined only to facilitate the leisure and entertainment requirement that needs to be biased during decoration, and the matching degree of the decoration consultant is the highest at this time, namely, the decoration consultant is the target role.
In other examples, when the determined candidate characters are multiple, the candidate characters can be sequentially used as target characters of the voice assistant in the order from high to low in matching degree, alternatively, the target characters can sequentially play response voices in the order from high to low in matching degree, still taking the foregoing three characters as an example, after finishing consultants speak, a movie master can feed back "is the xxx movie newly mapped recently, and need to watch.
And 2, determining the target role of the voice assistant according to the user instruction.
When the determined candidate characters are plural, the user selects at least one as a target character.
And 3, determining the target role of the voice assistant according to the current scene information and/or the historical interaction information.
For example, when the determined candidate characters are a decoration consultant and a decoration designer, the intelligent terminal can automatically start a 3D radar scanning function of the rear camera, select 'decoration designer' as a target character of a voice assistant if the current indoor layout information can be scanned, and select 'decoration consultant' as a target character of the voice assistant if the current indoor layout information is not scanned.
For another example, when a taker incoming call is detected, the target role of the voice assistant may be determined based on historical interaction information (e.g., the mood of the last call with the taker, etc.). For another example, the historical interaction information is that the current target role is the intelligent robot when the intelligent robot is communicated with the takeout person last time, and the current target role is still the intelligent terminal user if the intelligent terminal user is the current target role last time.
Based on the above, the voice assistant can automatically switch roles in the interaction process by triggering the preset event, so that personalized interaction of voice interaction is improved, and user experience is improved.
In one scene, the intelligent terminal starts a default assistant, i'm is your assistant, what needs help, the user voice is "want to decorate a house, what needs to be planned", the target role is determined to be a decoration assistant at this time, the response voice is "i are decoration consultants, how big the house is requested, how big you can be described, the user voice is" a little, the noon, i'm is first helped to send braised chicken to a company ", the target role is determined to be a merry assistant at this time, the merry assistant performs point take-out and feeds back to the user as" take-out point good ", then the user voice is" three-room hall, like watching movies at home at ordinary times, play games ", the target role is determined to be a decoration consultant at this time, and the response voice is" xxxx ".
In another scenario, the intelligent terminal turns on a default assistant, "I are your assistant, what is needed to help? at this time, two voice assistants can be determined, one of the determined target roles is a lover, the response voice is" suggest gentle mood, then in searching for a solution woolen ", the other determined target role is a story generation assistant, the response voice is" small story woolen which is not to give you a happy feeling.
The voice assistant can automatically switch roles and execute corresponding operations and conversations without manually selecting the roles or providing detailed switching instructions, thereby providing more personalized and efficient experience for the user and providing more proper services. In addition, the user can conveniently chat with multiple target roles in a dialogue mode, individuation and efficiency of interaction are further improved, and user satisfaction is higher.
In addition, unlike the conventional method, only the text content needs to be determined, and then the text content is generated into a section of response voice with fixed voice characteristic information (namely, speech speed, intonation, emotion, etc. which are commonly known), and in step S11, the characteristic information of the response voice is adaptively changed.
In an example, as shown in fig. 6, the method for determining the answer speech includes:
s111, determining feature information of response voice according to at least one of a working mode of the intelligent terminal, a user role, historical interaction information and preset features of user voice;
s112, determining the text content of the answer, and
And S113, generating response voice according to the characteristic information and text content of the response voice.
In step S111, the preset feature information may include at least one factor of text content, voiceprint feature information, speech speed, intonation, emotion feature information, and the like of the user' S voice. Analyzing the user voice through a natural language processing technology, extracting the preset characteristic information, and determining the characteristic information of the response voice.
The operation mode of the intelligent terminal includes, but is not limited to, at least one of a conference mode, a do-not-disturb mode, an outdoor mode, and the like. Taking a conference mode as an example, when an incoming call is detected, the intelligent terminal can automatically answer, and the voice assistant determines the target role of the intelligent robot with the preset gender, so that voice characteristic information corresponding to the intelligent robot is used as characteristic information of response voice. The predetermined gender may be identified from historical interaction information (i.e., a pre-learned gender of the present intelligent terminal user).
The user role may be the role of the user who utters the user's voice, for example, the role of the user of the present intelligent terminal. For example, when an incoming call of "father" is detected, the user character is determined to be "son", thereby determining that the characteristic information of the answer voice is biased toward respect, naughty, and the like.
The preset features of the user's voice include, but are not limited to, at least one factor of text content, voiceprint feature information, speech speed, intonation, emotion feature information, etc. of the user's voice.
In an example, step S111 may obtain an industry spoken help word of the target character through a preset AI model, and generate text content of the response according to the industry spoken help word.
Taking a preset feature as an example of the speech speed, the intelligent terminal can detect the current speech speed of the user speech, analyze the difference value of the current speech speed and the speech speed of the AI speech assistant (namely the target role) in real time, compare the difference value with a corresponding preset threshold value, increase the speech speed of the AI speech assistant if the difference value exceeds the first preset threshold value, which indicates that the speech speed of the AI speech assistant is too slow, and increase the speech speed of the AI speech assistant if the difference value is smaller than a second preset threshold value, which is smaller than the first preset threshold value, which indicates that the speech speed of the AI speech assistant is too fast, so that the speech speed of the AI speech assistant can be reduced, or the AI speech assistant inserts common spoken language aid words in the dialogue to reduce the speech speed of the AI speech assistant, thereby forming the text content of the response determined in step S112. The spoken language assisting words in the industry include, but are not limited to, common spoken language assisting words in the industry where the target role is located, can be learned in daily communication between a plurality of industry personnel and consumer groups through an AI model, and can be, for example, "boss", "sister" and "excellent taste.
The manner in which the text content of the response is generated based on the industry spoken language facilitation may include at least one of:
Mode 1, inserting industry spoken language assisting words at pause points of the text content of the response.
For example, the AI voice assistant may first identify a stop point for the text content of the response and insert the industry spoken word into the corresponding stop point. The AI speech assistant may determine the stop point by recognizing punctuation marks of the text content, for example, inserting an industry spoken help word after a period or exclamation mark when the period or exclamation mark is recognized, or by learning the stop or sentence spacing of the response when the text content is converted to speech by a preset AI model.
And 2, learning sentence intervals when the text content of the response is converted into voice through a preset AI model, and inserting the industrial spoken language auxiliary words into the sentence intervals exceeding a preset threshold value.
The characteristic information of the response voice can be adaptively changed, for example, along with the different working modes, user roles, historical interaction information and/or user voices of the intelligent terminal, the voice assistant can respond with the adaptive intonation and emotion, so that the personalized interaction of voice interaction is improved, and the user experience is improved.
Second embodiment
Referring to fig. 7, the interaction method of the second embodiment includes the following steps:
S21, responding to a preset event, and determining a target role of the voice assistant and/or response voice;
s22, backing up and storing the output result of the target role and/or the response voice and the voice of the user;
S23, labeling the user according to text content of the voice of the user and/or historical interaction information of the user and the intelligent terminal.
The step S21 may correspond to the step S11 of the foregoing embodiment, so the same features of this embodiment as those of the foregoing embodiment may be referred to the foregoing, and will not be repeated here. In addition, steps S22 and S23 may be performed by only one item, and the order may be interchanged when both items are performed.
In step S23, as shown in fig. 5a and 5b, a corresponding AI mark may be set for the output result of the voice assistant, for example, after the user runs out of the meeting, the user may check the missed call record stored in the backup through the AI mark, and the user may click to read the backup to play the call record of the corresponding person.
In step S23, the unknown number is automatically marked as suspected fraud according to the call content of the caller (i.e. the text content of the user voice) 'you have in 500w hope of getting the attention of xxx', the call content of the caller (i.e. the text content of the user voice) 'we have a child insurance product, you need you have, i introduce)' the historical interaction information of the user and the intelligent terminal is 'through learning the user habit of the user, the fixed number and the current voice interaction event are marked as harassment', and if the historical interaction information of the user and the intelligent terminal is 'through learning the user habit of the user, the relevant insurance product of the recent research is found', the fixed number and the current voice interaction event are not marked, or are marked as normal.
By identifying nuisance calls and the latest fraud scene, the application can construct a zero-nuisance and fraud-preventing call environment, which is beneficial to improving the terminal use privacy of users.
After step S22, as shown in fig. 8, it may further include S24 of creating a schedule. Namely, the intelligent terminal recognizes text information capable of creating the schedule from the backup stored result through a natural language processing technology, and the schedule can be automatically extracted and created. For example, "xx" in the backup store can be extracted to let you get home from 12 points to eat noodles "as key information, and" xx "can be the relevant party of the calendar.
Through backup storage, the application can provide intelligent experience of full scenes for users, including risk identification and abstract generation, and further derived personalized services.
In other examples, the calendar may be created directly without backup storage, but in response to recognizing calendar information from the output result of the target character and/or the answer voice, and the user voice. For example, when the working mode of the intelligent terminal is the conference mode, the incoming call of the father is received, the voice assistant automatically answers and replies to the father, the father is in a meeting, the father forgets to get the dinner on the day of the week at night, the voice assistant automatically replies to the sound, optionally, after hanging up the phone, the voice assistant can display the schedule of the day dinner is created on a preset interface (such as the current conversation interface) of the intelligent terminal, or play the response voice that the schedule of the day dinner is created.
Third embodiment
On the basis of the description of any one of the foregoing embodiments, a difference is that the preset event of the third embodiment includes receiving a control instruction of the associated device and/or a user voice. For convenience of description, the associated device is referred to as a second intelligent terminal, and the present intelligent terminal is referred to as a first intelligent terminal.
The user can set a preset relation between the two intelligent terminals to set a linkage relation between the two intelligent terminals, wherein the preset relation comprises at least one of a connection relation established based on near field communication technologies such as Bluetooth, a connection relation established based on far field communication technologies such as a cellular network, a connection relation established based on authentication, a connection relation located in the same local area network, a connection group of the same Internet of things, master-slave control or information receiving-transmitting relation formed according to a user-defined rule.
In an example, after the linkage is set, the user may issue a user voice through a voice assistant (referred to as a "first voice assistant") of the first intelligent terminal, where the user voice is also received by a voice assistant (referred to as a "second voice assistant") of the second intelligent terminal, as shown in fig. 9 (a) and 9 (b), the first intelligent terminal may directly display an icon of the "second voice assistant", and the user may receive a voice instruction issued by the first intelligent terminal by the "second voice assistant", while the icon of the "first voice assistant" may be displayed while the icon of the "second voice assistant" is displayed, or may not be displayed, and/or the "first voice assistant" may continue to monitor and recognize the voice instruction, or may not monitor and recognize the voice instruction. For example, a user may only issue a user voice through the first intelligent terminal, as shown in fig. 9 (b), icons of the "first voice assistant" and the "second voice assistant" may be simultaneously displayed on the operation interface of the first intelligent terminal, where the user voice includes content that needs to be executed by the first intelligent terminal and also includes content that needs to be executed by the second intelligent terminal, and then the first intelligent terminal and the second intelligent terminal execute corresponding operations respectively. Here, the finally determined second voice assistant has the same target role as the first voice assistant, and the text content of the response voice is different, but the feature information of the response voice may be the same.
The embodiment of the application also provides an intelligent terminal which comprises a memory and a processor, wherein the memory stores an interaction program, and the interaction program is executed by the processor to realize the steps of the interaction method in any embodiment.
The embodiment of the application also provides a storage medium, and the storage medium stores an interaction program which realizes the steps of the interaction method in any embodiment when being executed by a processor.
The embodiments of the intelligent terminal and the storage medium provided by the application can contain all technical characteristics of any of the embodiments of the interaction method, so that the method has corresponding beneficial effects, and the expansion and explanation contents of the description are basically the same as those of each embodiment of the method, and are not repeated herein.
Embodiments of the present application also provide a computer program product comprising computer program code which, when run on a computer, causes the computer to perform the method as in the various possible embodiments described above.
The embodiment of the application also provides a chip, which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for calling and running the computer program from the memory, so that the device provided with the chip executes the method in the various possible implementation manners.
It can be understood that the above scenario is merely an example, and does not constitute a limitation on the application scenario of the technical solution provided by the embodiment of the present application, and the technical solution of the present application may also be applied to other scenarios. For example, as one of ordinary skill in the art can know, with the evolution of the system architecture and the appearance of new service scenarios, the technical solution provided by the embodiment of the present application is also applicable to similar technical problems.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.
The units in the device of the embodiment of the application can be combined, divided and deleted according to actual needs.
In the present application, the same or similar term concept, technical solution and/or application scenario description will be generally described in detail only when first appearing and then repeatedly appearing, and for brevity, the description will not be repeated generally, and in understanding the present application technical solution and the like, reference may be made to the previous related detailed description thereof for the same or similar term concept, technical solution and/or application scenario description and the like which are not described in detail later.
In the present application, the descriptions of the embodiments are emphasized, and the details or descriptions of the other embodiments may be referred to.
The technical features of the technical scheme of the application can be arbitrarily combined, and all possible combinations of the technical features in the above embodiment are not described for the sake of brevity, however, as long as there is no contradiction between the combinations of the technical features, the application shall be considered as the scope of the description of the application.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, a controlled terminal, or a network device, etc.) to perform the method of each embodiment of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in a storage medium or transmitted from one storage medium to another storage medium, for example, from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.) means. The storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. Usable media may be magnetic media (e.g., floppy disks, storage disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., solid state storage disk Solid STATE DISK (SSD)), etc.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1.一种交互方法,应用于智能终端,其特征在于,包括步骤:1. An interactive method, applied to a smart terminal, comprising the steps of: S11,响应于预设事件,确定语音助手的目标角色和/或应答语音。S11, in response to a preset event, determining a target role and/or a response voice of the voice assistant. 2.根据权利要求1所述的方法,其特征在于,所述预设事件,包括以下至少一项:2. The method according to claim 1, wherein the preset event includes at least one of the following: 用户语音的文本内容发生停顿;The text content of the user's voice pauses; 由第一用户切换为第二用户发出用户语音;Switching from the first user to the second user to issue a user voice; 语音呼叫事件;Voice call events; 与用户语音具有预设关联关系的目标场景和/或目标对象。A target scene and/or target object having a preset association relationship with the user's voice. 3.根据权利要求2所述的方法,其特征在于,所述预设关联关系包括以下至少一项:3. The method according to claim 2, wherein the preset association relationship includes at least one of the following: 与用户语音的文本内容具有文本逻辑关系;Having a textual logical relationship with the text content of the user's speech; 预先设定的用户语音的文本内容与目标场景之间的映射关系;A mapping relationship between the pre-set text content of the user's speech and the target scene; 预先设定的用户语音的文本内容与目标对象之间的映射关系;A mapping relationship between the text content of the user's speech and the target object; 预先设定的用户与目标场景之间的映射关系;Pre-set mapping relationship between users and target scenarios; 预先设定的用户与目标对象之间的映射关系。A pre-defined mapping relationship between users and target objects. 4.根据权利要求1所述的方法,其特征在于,所述确定语音助手的目标角色,包括以下至少一项:4. The method according to claim 1, wherein determining the target role of the voice assistant comprises at least one of the following: 根据用户语音的文本内容,确定匹配度最高的角色作为目标角色;According to the text content of the user's voice, determine the character with the highest matching degree as the target character; 根据用户指令确定语音助手的目标角色;Determine the target role of the voice assistant based on user instructions; 根据当前的场景信息和/或历史交互信息,确定语音助手的目标角色。Determine the target role of the voice assistant based on the current scene information and/or historical interaction information. 5.根据权利要求1至4中任一项所述的方法,其特征在于,所述确定应答语音,包括:5. The method according to any one of claims 1 to 4, characterized in that the step of determining the response voice comprises: 根据智能终端的工作模式、用户角色、历史交互信息和用户语音的预设特征中的至少一项,确定应答语音的特征信息;Determine characteristic information of the answering voice according to at least one of the working mode of the intelligent terminal, the user role, the historical interaction information and the preset characteristics of the user voice; 确定应答的文本内容;Determine the text content of the response; 根据所述应答语音的特征信息和所述文本内容生成应答语音。A response voice is generated according to the feature information of the response voice and the text content. 6.根据权利要求5所述的方法,其特征在于,所述确定应答的文本内容,包括:6. The method according to claim 5, characterized in that the text content of the determination response comprises: 通过预设AI模型采样获取所述目标角色的行业口语助词;Acquire the industry colloquial particles of the target role through sampling of a preset AI model; 根据所述行业口语助词生成应答的文本内容。Generate text content of the response based on the industry colloquial particles. 7.根据权利要求6所述的方法,其特征在于,所述根据所述行业口语助词生成应答的文本内容,包括如下至少一项:7. The method according to claim 6, characterized in that the text content of the response generated according to the industry colloquial particle includes at least one of the following: 在所述应答的文本内容的停顿点插入所述行业口语助词;Inserting the industry colloquial particle at a pause point in the text content of the response; 通过预设AI模型学习所述应答的文本内容转换为语音时的语句间隔,并在超过预设阈值的语句间隔插入所述行业口语助词。The preset AI model is used to learn the sentence interval when the text content of the response is converted into speech, and the industry spoken auxiliary words are inserted into the sentence interval exceeding the preset threshold. 8.根据权利要求1至4中任一项所述的方法,其特征在于,还包括:8. The method according to any one of claims 1 to 4, further comprising: 根据用户语音的文本内容和/或用户与所述智能终端的历史交互信息,对用户进行标注。The user is marked according to the text content of the user's voice and/or the historical interaction information between the user and the smart terminal. 9.一种智能终端,其特征在于,包括:存储器和处理器,其中,所述存储器上存储有消息显示程序,所述消息显示程序被所述处理器执行时实现如权利要求1至8中任一项所述的交互方法的步骤。9. An intelligent terminal, characterized in that it comprises: a memory and a processor, wherein a message display program is stored in the memory, and when the message display program is executed by the processor, the steps of the interaction method according to any one of claims 1 to 8 are implemented. 10.一种存储介质,其特征在于,所述存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8中任一项所述的交互方法的步骤。10. A storage medium, characterized in that a computer program is stored on the storage medium, and when the computer program is executed by a processor, the steps of the interactive method according to any one of claims 1 to 8 are implemented.
CN202411397740.3A 2024-10-08 2024-10-08 Interaction method, intelligent terminal and storage medium Pending CN119296534A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411397740.3A CN119296534A (en) 2024-10-08 2024-10-08 Interaction method, intelligent terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411397740.3A CN119296534A (en) 2024-10-08 2024-10-08 Interaction method, intelligent terminal and storage medium

Publications (1)

Publication Number Publication Date
CN119296534A true CN119296534A (en) 2025-01-10

Family

ID=94160645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411397740.3A Pending CN119296534A (en) 2024-10-08 2024-10-08 Interaction method, intelligent terminal and storage medium

Country Status (1)

Country Link
CN (1) CN119296534A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119583703A (en) * 2025-02-05 2025-03-07 深圳市泰衡诺科技有限公司 Processing method, intelligent terminal and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119583703A (en) * 2025-02-05 2025-03-07 深圳市泰衡诺科技有限公司 Processing method, intelligent terminal and storage medium

Similar Documents

Publication Publication Date Title
US12374367B2 (en) Enhancing audio using multiple recording devices
US11670302B2 (en) Voice processing method and electronic device supporting the same
KR101726945B1 (en) Reducing the need for manual start/end-pointing and trigger phrases
US9111538B2 (en) Genius button secondary commands
WO2020024885A1 (en) Speech recognition method, and speech segmentation method and apparatus
CN112292724B (en) Dynamic and/or context-specific hotwords used to invoke the auto attendant
US8144939B2 (en) Automatic identifying
KR20220024557A (en) Detection and/or registration of hot commands to trigger response actions by automated assistants
CN108648754B (en) Voice control method and device
CN111738100B (en) Voice recognition method based on mouth shape and terminal equipment
CN107644646A (en) Method of speech processing, device and the device for speech processes
CN109151186B (en) Theme switching method and apparatus, electronic device, computer-readable storage medium
CN109462697A (en) Method of speech processing, device, mobile terminal and storage medium
US12008988B2 (en) Electronic apparatus and controlling method thereof
CN119296534A (en) Interaction method, intelligent terminal and storage medium
CN109256151A (en) Call voice regulates and controls method, apparatus, mobile terminal and readable storage medium storing program for executing
CN108133708A (en) A kind of control method of voice assistant, device and mobile terminal
CN109788423A (en) Phonetic incepting method, apparatus and computer readable storage medium
WO2019242415A1 (en) Position prompt method, device, storage medium and electronic device
CN114065168A (en) Information processing method, intelligent terminal and storage medium
CN109167880A (en) Double-sided screen terminal control method, double-sided screen terminal and computer readable storage medium
CN112700783A (en) Communication sound changing method, terminal equipment and storage medium
CN110321201A (en) A kind of background program processing method, terminal and computer readable storage medium
CN115469949A (en) Information display method, intelligent terminal and storage medium
CN114093357A (en) Control method, intelligent terminal and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination