WO2021165990A1

WO2021165990A1 - A system and method for transmitting user specific data to devices

Info

Publication number: WO2021165990A1
Application number: PCT/IN2021/050155
Authority: WO
Inventors: Zubair Ahmed; Mio AHMED; Michael Schmitz; Mohammad Fayadan HOSSAIN
Original assignee: Raghavendra Prasad
Priority date: 2020-02-19
Filing date: 2021-02-18
Publication date: 2021-08-26
Also published as: JP2023518631A; PH12022552112A1; JP7719508B2

Abstract

A system and method for delivering content services to the user based on the user requirement is provided. The method comprises receiving a user request over a network. The method further comprises processing the user request by a cross-modal voice interface engine and planning and delivering of the content services to the user based on the user request parameters calculated by the cross-modal voice interface engine. Hence, by using the above system and method of data transmission, the system is made independent of the type of the personal technology devices using a telecommunication network and the user-specific data is displayed over a display device as per the requirements of the user.

Description

A SYSTEM AND METHOD FOR TRANSMITTING USER SPECIFIC DATA TO

DEVICES

FIELD OF THE INVENTION

The present invention generally relates to a system and a method for user-specific data transmission and more particularly the present invention relates to a method of connecting / interfacing communication devices to display devices and transmitting the user-specific data over a display.

BACKGROUND OF THE INVENTION

The field of data transmission has undergone phenomenal changes over the last few years. Marketing teams of over the globe face a constant challenge of reaching the target customer and successfully sell their product or services (hereinafter “Product”). Data transmission serves an important purpose since the content, instance and duration of data transmission helps in decision making at various stages of data processing. At the same time, different users may be in a need of different type of products and also the users very often face the challenge of finding the data according to their needs. Various modes of data transmission constantly undergo innovative changes to enable the reach of the data to the target population. These forms of data transmission comes with their own limitations in terms of personal data privacy and doesn’t guarantee the availability of output / products related to the corresponding data at all the locations.

Personal technology (Personal Tech) includes various electronic devices and the electronic devices include various services that cater to individual needs, likes, and dislikes on kind and quantity of data as per the need of the user. An increase in number and types of personal tech devices has made it possible to individually identify the requirements and deliver output data / products that meet those requirements. This has opened doors to new age requirements where individual requirement patterns are identified and intuitive requirement suggestions are provided based on the user data such as age, sex, location, buying history, time of the month, internet search history, medical history, and alike. Personal tech has made it possible to obtain the search history, understand the user requirements, buying trends, store the data for future reference, and constantly study the user behavior to establish a pattern useful for accurate advertising. Voice user interfaces (VUIs) are the modules that enable users to input vocal commands to input their query and obtain the search results. For example, if the user is asking a product / requirement query over the phone, VUI displays the product / requirement and explains the features of the product / requirement directly to the user or if the user is remotely located, VUI suggests the nearest possible location of the product / requirement. However, it is challenging for the user of the VUI to understand the capabilities of VUI and maximize the benefits of using it. However, similar benefits are not available if the user has access only to the ordinary landline phone or less sophisticated devices. In present devices, VUIs are integrated with a display and the user needs to focus his attention on such displays. A robust system is still required where irrespective of the input means, user data is collected, stored for future use, and reproduced when requested by the user.

Available systems and methods have their limitations and an integrated approach is required that will intuitively obtain and store the user-specific data for future use and provide flexible display options thereby enabling the user to select/decide the display option.

BRIEF DESCRIPTION OF THE INVENTION

The present invention discloses a system and a method for user-specific data transmission and more particularly the present invention relates to a method of connecting / interfacing communication devices to display devices and transmitting the user-specific data over a display. The system comprises of a service request means for inputting a request by a user and a cross- modal voice interface engine configured to receive the user request over a telecommunication network. The cross-modal voice interface engine comprises of an IVR system configured to receive the user request and a natural language understanding component configured to receive the user request from the IVR system and translate the user request into the machine-readable transcript. The system further comprises of a dialogue and media planner. The dialogue and media planner is connected to the IVR system and is configured to plan and schedule the transmission of data to the user based on the user request parameters calculated by the dialogue and media planner.

In another embodiment of the present invention, the present invention provides a method of connecting / interfacing communication devices to display devices and transmitting the user- specific data over a display device. The method comprises the steps of receiving a user request over a network. The method further comprises processing the user request by a cross-modal voice interface engine and planning and transmitting data to the user based on the user request parameters calculated by the cross-modal voice interface engine.

Hence, by using the above system and method of data transmission, the system is made independent of the type of the personal technology devices using a telecommunication network and the user-specific data is displayed over a display device as per the requirements of the user.

BRIEF DESCRIPTION OF THE ACCOMPANIED DRAWINGS

The invention will now be described in detail with reference to drawings in which:

Figure 1 illustrates the overall system setup according to one embodiment of the disclosure.

Figure 2 illustrates the cross-modal voice interface engine overview according to an embodiment of the disclosure.

Figure 3 illustrates the conventional phone plus smart device hardware setup according to an embodiment of the disclosure.

Figure 4 illustrates the smartphone hardware setup according to an embodiment of the disclosure. Figures 5A-5C illustrate the method of delivering services to the user according to an embodiment of the disclosure.

Figure 6 illustrates a setup for data transmission using URL sent via SMS using a feature phone.

DETAILED DESCRIPTION OF THE ACCOMPANIED DRAWINGS

Described herein are systems and methods for delivering user-specific data over the telecommunication network and provide efficient means to the user for displaying such data. The systems and methods are described with respect to figures and such figures are intended to be illustrative rather than limiting to facilitate explanation of the exemplary systems and methods according to embodiments of the invention.

The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the appended claims.

As used herein, the term “network” refers to any form of a communication network that carries data and used to connect communication devices (e.g. phones, smartphones, computers, servers) with each other. According to an embodiment of the present invention, the data includes at least one of the processed and unprocessed data. Such data includes a data which is obtained through an automated data processing, a manual data processing or an unprocessed data. The data according to the present invention is in form of a text, an image, an audio file or a video file.

As used herein, the term “artificial intelligence” refers to a set of executable instructions stored on a server and generated using machine learning techniques.

As used herein, the term “hub device” refers to an electronic device, the electronic device comprising of a processor and the electronic device is capable of being connected to other electronic / computing devices, displays, sensors over a network.

As used herein, the term “VUI” refers to voice user interface modules that enable users to input vocal commands to input their request / query and obtain a set of result(s) corresponding to their request / query.

As used herein, the term “IVR” refers to an interactive voice response system that provides a response to incoming call queries of the users.

Figure 1 shows a system (100) according to one embodiment of the disclosure. A user or set of users (101) may dial a call using a device ( 102) over a network ( 105) of a telecom operator. The device (102) may be a telecommunication device, smartphone or an ordinary landline phone and the user may input his service request through this device (102). The device (102) acts as a service request means for inputting the user request. The input voice of the user is processed using a system and method for speech to text translation of any incoming voice of any user. Artificial intelligence techniques are employed to collect data from several users or an individual user to extract key terms indicative of user preferences. The call is further connected to a cross- modal voice interface engine (106). The cross-modal voice interface engine (106) is made of different components and is configured to process the call and is explained in detail in in figure 2. A hub device (103) is communicably connected to the cross-modal voice interface engine

(106). According to an embodiment of the present invention, the hub device (103) may be a computing device or a microcontroller capable of being connected over a network or a plurality of networks and to a plurality of devices (104) or sensors (107), the hub device (103) displays list of devices (104) or sensors (107) to which it is connected. In one embodiment of the present invention, the hub device (103) may be connected to devices (104) and the cross-modal voice interface engine (106) using the internet, Bluetooth or any other connecting/interfacing medium for transmission of data from hub device (103) to at least one of the devices (104) and sensors

(107). The data transmitting system for transmitting user (101) specific data comprises of a smart device with a sensor (107) connected to the hub device (103) and is capable of providing the surrounding information to the cross-modal voice interface engine (106). The cross-modal voice interface engine (106) is configured to detect the presence of at least one of the devices (104) and sensors (107) nearby the user (101) using the hub device (103) and send the user a signal indicative of the presence of the device ( 104) . By nearby devices, it is understood that the devices are communicating over the same network or different networks. Hence, the nearby devices include devices which are communicating over the same network or different networks. It shall be further noted that the transmission of the data to the user based on the user request is same as the communication of data to the user and eventually the data shall be transmitted to an electronic device only and the user would be receiving such a data through accessing such an electronic device only.

As the call progresses, the engine (106) provides an option to the user for selecting one of the devices (104) for displaying the output or for communicating the data to the device (104). The hub device (103) receives the command and data for displaying the data from the engine (106) over one or more available devices (104,107) as selected by the user.

Figure 2 describes the components of the cross-modal voice interface engine (106) in detail. An interactive voice response (IVR) (201) system receives a call from the user (101). According to the present invention, the IVR (201) is configured to handle multiple calls from different users simultaneously. Further, the IVR (201) is capable of connecting to different sub systems for processing the incoming calls. A natural language understanding component (202) is connected to IVR (201) which receives the user input and translates the natural language input into machine-readable information. The natural language understanding component (202) generates transcribed content, intent, entities, and metadata of the caller.

The machine-readable information generated by the natural language understanding component (202) defines the user requirement, and a system logic generates a plan to answer the user queries and provide possible display options. A dialogue and media planner (203) is provided that acts as a central logic to decide further steps based on the available input and output devices. A user-specific knowledge database (208) is connected to the media planner (203) to store the data specific to a user for future reference. The data of the database (208) include current and past interactions of the user with the system (100). A global knowledge database (207) is connected to a user specific knowledge database (208) to provide additional relevant information that might be useful for the user (101). A service repository (210) is connected to the planner (203) to retrieve and provide all the possible services to the user (101) based on the user (101) intent. In yet another embodiment of the present invention, the data transmitting system for transmitting user (101) specific data comprises of a third-party service repository connected to the service repository (210) to provide relevant third-party services to the user. According to an embodiment of the present invention, a third-party service module (212) is integrated with the service repository (210) to search for and provide additional relevant services based on the intent of user (101). Relevant data indicative of the available service options is also obtained from sensors (107) of the registered smart devices and provided to the planner (203). The planner (203) is capable of communicating with user-specific knowledge database (208) to retrieve various available parameters such as intent of current user, global knowledge database (208), service repository (210), third party service module (212) and sensors (107) of the registered smart devices to prepare a presentation plan for the user (101). According to one embodiment of the present invention, the presentation plan includes an audio and/or visual output and the sequence in which the plan is presented to the user. A natural language generation component (211) creates speech output for presenting it to the user (101). The visual output is sent to available display devices using the hub device (103). The system not only enables the user (101) to see and listen to all the available options specific to the requirement but also enables her to see other relevant information that enriches the quality of the answers sought by the user (101) and make more comprehensive decisions.

Figure 3 describes the integration of a conventional phone and smart device hardware setup according to an embodiment of the disclosure. A situation where the user (101) connects to an IVR system (201) over an ordinary landline phone ( 102) that has no networking capability, both audio and visual output may still be provided to the user according to the current embodiment. An internet of things (IoT) device (103) containing a hub software is connected to the cross- modal voice interface engine (106) via an internet connection and the device (103) is connected to a display such as local Smart television (104) using any known connecting means such as Bluetooth. Once the display device (104) is registered with the dialogue and media planner (203), the dialogue and media planner generates visual display commands for showing the results. Accordingly, the embodiment of figure 3 will generate audio and visual results in a manner similar to the embodiment of figure 2 even when the input device is an ordinary landline phone with no networking capability.

Example 1: The user (101) calls up the Cross-Modal Voice Interface Engine (106), as the user

(101) wants to travel to the seaside in the south of Bangladesh and wants to book a holiday home. The system (100) would then explain the options and at the same time display several options on the nearby screen (104). The user (100) refers to it by saying "Please tell me more about the second one" or by touching the second house on the screen (104), which the system

(100) can understand and respond by showing the details about the user-selected option.

Figure 4 describes a smartphone hardware setup according to an embodiment of the disclosure where the smartphone (102) hosts the hub application (103), contains a plurality of sensors (107), and a small display screen. An external display (104) may be connected to the smartphone

(102) to provide an additional display option. The smartphone (102) is registered with the planner (203) and when the user (101) calls up the cross-modal voice interface engine (106), the IVR (201) processes the incoming call in a manner as explained with respect to figure 2 and provides a dynamic output for further user selection. The user (101) may further select as many options from the results provided by the cross-modal voice interface engine (106) and request the cross-modal voice interface engine (106) to serve the selected options.

Example 2: The user (101) calls up the Cross-modal Voice Interface Engine (106), and wants to find out about the new shopping mall. The system (100) responds by explaining what shops she will find there and currently available special offers. At the same time, the system (100) displays images and videos of the location and its stores on a large display (104). When the user

(101) announces that she wants to go there, the system (100) responds by explaining how to get there and showing an overview map on the large display (104), offering to guide her to the location via her smartphone (102). When the user agrees verbally, the display (104) will be switched off and the system (100) displays navigation instructions on the smartphone (102) while retrieving GPS data from it to dynamically guide her until she reaches the location.

This system (100) of the above embodiments may be used to promote the data transmission to the user (101). When a user calls the cross-modal voice interface engine (106), the cross-modal voice interface engine (106) may decide if the appropriate data is available for a given user that is relevant to the user requirement. If the appropriate data relevant for a user (101) is already available with the system (100), it may initiate the transmission of data during the call and interacts with the user to explain any additional requirements for the data. If the relevant data is not available with the system (100), the system may find out an appropriate data from the network and initiate a fre sh call with the user ( 101 ) to transmit the data. The user (101) will have the option of continuing with the provided data or disconnect. The selection of the user (101) is analyzed to identify the relevance of a data for the user (101) and it is stored for future reference .

Figures 5A-5C describe a method (500) for transmitting data to the user based on the user requirement according to an embodiment of the present invention. As shown in figure 5. A, the method (500) includes a step of receiving (510) a user request over a network. The method further comprises the step of processing (520) the user request by a cross-modal voice interface engine.106. The method (500) further includes the step of planning and transmitting (530) of the data to the user based on the user request parameters calculated by the cross-modal voice interface engine.

The step of receiving (510) the user request comprises receiving the user request over a Smartphone, an ordinary phone, or an internet of things (IoT) device. The method (500) further includes storing the user request data in a user-specific knowledge database and the user-specific data is obtained using transcribed contents of speech-based interactions in a machine-readable format, analysis of the auditory properties of the speech, meta-data of the interaction and person data of the user.

As shown in figure 5.B, processing (520) the user request by a cross-modal voice interface engine includes the steps of receiving (521) the user request over an IVR system, translating (522) the request into the machine -readable transcript by a natural language understanding component and generating (523) a data transmitting plan by a dialogue and media planner. As shown in figure 5.C, the step of generating (523) the data transmission plan includes the step of obtaining (524) inputs from user-specific knowledge database, global knowledge database, service repository, third-party services, sensors, generating (525) an audio output using natural language generation component and generating (526) a visual output based on the available visuals to be displayed over the available display devices.

The step of transmitting (530) the data to the user comprises delivering the relevant data in audio or visual form to the user and displaying the visual information includes using a registered display device for visualizing the data.

According to yet another embodiment of the present invention, Figure 6 describes the user (101) using a feature phone (102) to engage in a call over the telecommunication network (105) to interact with the system (106). The feature phone (102) hosts a couple of standardized software for common tasks, including a web browser (103) to render webpages. When interacting with one or more services over phone, the cross-modal voice interface engine (106) dynamically generates a URL that points to additional media, e.g. a picture, which is published on the World Wide Web on the fly by the system. According to an embodiment of the present invention, this URL is sent via SMS protocol, such that the user accesses this URL through a SMS via the SMS application on the feature phone (102) while on the call or after the call. A click on the URL automatically opens the web browser (103) and directs it to the aforementioned dynamically generated URL and shows the image, which the Cross-modal Voice Interface Engine (106) intended to show to the caller.

The embodiments and examples discussed herein are for the purpose of illustration of the invention only. It is apparent that numerous other forms of the invention may be envisaged by those skilled in the art without departing from the scope of the invention. The claims and embodiments are meant to cover all such forms and modifications that are within the scope of the claims and description of the embodiments should not be construed as limiting the scope of the description.

Claims

WE CLAIM:

1. A data transmission system for transmitting user (101) specific data, the data transmission system comprises of: a service request means (102) for inputting a request by a user through a telecommunication network (105); a cross-modal voice interface engine (106) configured to receive the user request over a telecommunication network (105) wherein the cross-modal voice interface engine (106) comprises: an IVR system (201) configured to receive the user request; a natural language understanding component (202) configured to receive the user request from the IVR system (201) and translate the user request into machine -readable transcript; and a dialogue and media planner (203) connected to the IVR system (201) and configured to plan and schedule the transmission of the data to the user based on the user request parameters calculated by the dialogue and media planner.

2. The data transmission system for transmitting user (101) specific data according to claim 1 wherein the data transmission system for transmitting user (101) specific data comprises of a user-specific knowledge database (208) connected to the IVR system (201) and configured to store and update the user request history.

3. The data transmission system for transmitting user (101) specific data according to claim 2 wherein the data transmission system for transmitting user (101) specific data comprises of a global knowledge database (207) connected to the user-specific knowledge database (208) for providing information based on relevant global trend analysis of similar requests.

4. The data transmission system for transmitting user (101) specific data according to claim 1 wherein the data transmission system for transmitting user (101) specific data comprises of a service repository (210) connected to the dialogue and media planner (203) and configured to provide services relevant to user intent.

5. The data transmission system for transmitting user (101) specific data according to claim 4 wherein the data transmitting system for transmitting user (101) specific data comprises of a third-party service repository connected to the service repository (210) to provide relevant third- party services to the user.

6. The data transmission system for transmitting user (101) specific data according to claim 1 wherein the data transmitting system for transmitting user (101) specific data comprises of a hub device (103) capable of being connected to the cross-modal voice interface engine (106) and transmit the data over a device (104).

7. The data transmission system for transmitting user (101) specific data according to claim 6 wherein the hub device (103) is an internet of things (IoT) device comprising a hub software.

8. The data transmission system for transmitting user (101) specific data according to claim 1 wherein the data transmitting system for transmitting user (101) specific data comprises of a smart device with a sensor (107) connected to the hub device (103) and is capable of providing the surrounding information to the cross-modal voice interface engine (106).

9. The data transmission system for transmitting user (101) specific data according to claim 1 or claim 3 or claim 5 wherein the user request parameters calculated by the dialogue and media planner (203) is based on the inputs of at least one of the user intent, user-specific knowledge database (208), global knowledge database (207), service repository (210), third party service module, and sensors (107).

10. The data transmission system for transmitting user (101) specific data according to claim 1 wherein the dialogue and media planner (203) is configured to prepare a presentation plan of the services to be rendered to the user.

11. The data transmission system for transmitting user (101) specific data according to claim 1 wherein the data transmission system comprises of a natural language generation module (211) connected to the dialogue and media planner (203) and capable of generating and transmitting data to the user.

12. The data transmission system for transmitting user (101) specific data according to claim 1 wherein the service request means comprises of a smartphone, ordinary phone, or any other input device that may be connected over the telecommunication network (105).

13. The data transmission system for transmitting user (101) specific data according to claim 11 wherein the data transmitted to the user comprises of an audio format data.

14. The data transmission system for transmitting user (101) specific data according to claim 11 wherein the data transmitted to the user comprises of data corresponding to at least one of the user request and system generated data generated through the natural language generator (211).

15. The data transmission system for transmitting user (101) specific data according to claim 11 wherein the data transmitted to the user is further transmitted to the device (104) connected to the IVR system (201) through the hub device (103).

16. A method for transmitting user (101) specific data, wherein the method of transmitting user (101) specific data comprises the steps of: receiving a user (101) request over a telecommunication network (105); processing the user request by a cross-modal voice interface engine (106); and planning and transmitting data to the user based on the user request parameters calculated by the cross-modal voice interface engine.

17. The method for transmitting user (101) specific data according to claim 16 wherein processing the user request by a cross-modal voice interface engine (106) comprises the steps of: receiving the user request over an IVR system (201); translating the request into the machine-readable transcript by a natural language understanding component (202); and generating a data transmission plan by a dialogue and media planner (203).

18. The method for transmitting user (101) specific data of claim 17 wherein generating a data transmission plan comprises the steps of: obtaining inputs from a user-specific knowledge database, global knowledge database, service repository, third-party services, and sensors; generating an audio output using natural language generation component; and generating a visual output based on the available visuals to be displayed over the available display devices.

19. The method for transmitting user (101) specific data according to claim 16 wherein receiving the user request comprises receiving the user request over a smartphone, an ordinary phone, or an internet of things (IoT) device.

20. The method for transmitting user (101) specific data according to claim 16 comprises storing the user request data in a user-specific knowledge database.

21. The method for transmitting user (101) specific data according to claim 20 wherein the user- specific data is obtained using transcribed contents of speech-based interactions in a machine- readable format, analysis of the auditory properties of the speech, meta-data of the interaction, and personal data of the user.

22. The method for transmitting user (101) specific data according to claim 16 wherein transmitting the user specific data to the user comprises transmitting the data in an audio or visual format to the user.

23. The method for transmitting user (101) specific data according to claim 16 comprises the step of displaying the visual information over a registered display device (104, 107) using a hub device (103).