GB2585184A

GB2585184A - Real-time voice communications system with user/counterparty verification

Info

Publication number: GB2585184A
Application number: GB1908952.3A
Authority: GB
Inventors: Douglas Blair Christopher; Laurence Heap Richard
Original assignee: Software Hothouse Ltd
Current assignee: Software Hothouse Ltd
Priority date: 2019-06-23
Filing date: 2019-06-23
Publication date: 2021-01-06
Also published as: GB201908952D0

Abstract

Several communications devices (1) are configured with the addresses of one or more mobile access points (2), via which communication sessions that contain an audio stream are configured with one or more counterparty device (3,4). Speaker verification is performed using a time-bounded sample of the audio stream, and comparing an extracted set of characteristics with a previously measured reference set. In embodiments the communications device (1) is a thin client, with routing, call control, speaker verification and storage of personal information (to avoid data leakage) all managed by the mobile access point. This allows several identities (such as business and personal) to be represented by a single device. Speaker verification may be done repeatedly during the communication system, at random intervals, or after an energy envelope of the audio stream exceeds level and duration thresholds. Anti-spoofing may be provided using a predetermined library of energy envelopes.

Description

Real-time Voice Communications System with User/Counterparty Verification.

This invention provides means of verifying the current speaker on telephone calls by combining voiceprint verification with predetermined authorisation rights and real-time knowledge of the state of the communications equipment purported to be in use.

Background

When smartphones first appeared (circa 2007) they had what we would now consider extremely limited capabilities. As a brand new genre of devices, with a new "touchscreen" paradigm their users were far less sophisticated than today. Speech recognition was not something that could be deployed on them. Their communications capabilities were almost exclusively telephony and short message service (SMS) texting -though, notably, at that time texting was not widely used in the United States.

Consequently, the user interface included separate "Phone" and "Message" applications ("apps" hereafter). Each of these was relatively simple. A decade on, these have barely changed yet the world in which the devices operate and the sophistication of their users has grown immensely.

There are a number of shortcomings with the "standard" telephony and messaging apps.

Recent data privacy scandals and the introduction of tighter regulations such as the European Union's General Data Protection Regulation (GDPR) mean that an increasing number of people are either carrying separate business and personal phones or are, reluctantly, allowing their employer to have significant control over their phone with the installation of "Mobile Device Management" (MDM) apps. To avoid the need for such intrusive control, an app would have to minimize the volume of personal information on the phone and keep business and personal contact details and history separated.

There are now many ways to communicate with others yet each -despite the best efforts of the "Unified Communications" advocates -still operates largely within its own silo for the majority of smartphone users. Seeing the complete history of how you have interacted with an individual or group often requires accessing phone, messaging, email and social media accounts. There is no combined trail that lets you instantly realise that someone is waiting for you to respond on some obscure app that you'd forgotten they were using to communicate with you.

The user interface of the standard smartphone apps for telephony and messaging was designed (over a decade ago) for a naïve user on a tiny screen. The "tabbed" interface with separate pages selectable for "Favourites", "Recents", "Contacts", "Keypad" and "Voicemail" is far from optimal. A tabbed interface brings the app up showing the same page you last used. However, successive uses of the "Phone" app are typically very different. The fact that you last looked up a contact's details is irrelevant if a voicemail has just been received, prompting you to open the app.

Users are increasingly at risk of being deceived by fraudsters -particularly when contacted by phone or text message. Often, the only identification one has to hand is the phone number of the calling party. This may be unknown to you -in which case it is cumbersome to check it (and common scams exist to thwart such attempts). If the number is in your "Contacts" and is shown as a match, this makes it likely you will believe that is who you are conversing with. However, such calling line identification (CLID) is easily "spoofed".

Increasingly, speaker verification systems -in which a speaker's voice is analysed and compared against a known prior sample of speech to confirm or reject the asserted identity -are being deployed. These are now used for even highly sensitive interactions such as personal banking calls.

Multi-factor Authentication (MFA) systems are therefore now widely used by businesses to identify individual customers securely. It is difficult, however, for a customer to obtain the same degree of certainty that they are indeed speaking to an authorized representative of the organization purporting to be calling them. So long as we all still have phone numbers and talk to each other "over the phone", these challenges will only get worse.

This invention builds on previous patent apps by the same inventor(s).

UK Patent Application GB1816697.5, "System and Method for Control of Business Telephone Calls over Cellular Networks" describes a system architecture in which business communications from mobile devices are routed via a "Mobile Access Point" (MAP) which completes and manages the remainder of the interaction with the remote pa rty(ies). This component performs many of the same functions for communications as a Web Proxy Server does for browsers. Significantly, it provides access to the audio streams both to and from the end user's smartphone (or table, laptop, browser or computer running the app).

UK Patent Application GB1816863.3, "System and Method for Hands-free Advanced Control of Real-time Data Stream Interactions" describes how the architecture described above allows speech recognition to be used to enhance the productivity and satisfaction of both the calling and called parties, before, during and after an audio interaction.

Statement of Invention

This invention provides a highly accurate of verifying (or rejecting) the identity of the current speaker on a telephone call. It does so by combining a rigorous and secure enrolment procedure with on-the-fly voiceprint verification during calls and, optionally, detailed real-time knowledge and/or inference of the state of the communications equipment purported to be in use.

Introduction to the Drawings

Figure 1 shows an exemplary network infrastructure allowing the invention to be deployed in a business setting.

Figure 2 shows an exemplary "Home" screen of the app that is in communication with the MAP.

Figure 3 shows an exemplary "Contact History" screen of the app that is in communication with the MAP.

Figure 4 shows the data structures used on the end user device and within the MAP to minimize the risks of personal data leakage via the end user device.

Detail of the Invention Interactions are classified as "real-time" (telephony, video calling, desktop sharing, conference calls etc.) or "messaging" (SMS, Instant Messaging, Email etc.).

The key differentiator being that a real-time interaction spans a contiguous period of time within which the "called party" is expected to respond and engage in a (normally) two-way interaction with the initiator or "calling party". The actual exchange of information typically does not begin until the called party "answers".

A messaging interaction, by contrast, consists of one or more discrete exchanges of information, the transmission time of each being unaffected by actions on the part of the recipient.

The term "counterparty" is used to identify an entity or group of entities with whom interactions take place. A counterparty may be a person, a business, an application (such as a "bot") or a plurality of any combination of these.

The term "nickname" is used for the (preferably shortened) name by which the app user identifies a counterparty. For those they interact with frequently, these are normally short and of little meaning to someone who does not know the individual. Should someone access the app without their permission, for example, "kids", "direct reports", "team", "office", "Dad", "JB", "Bank", "Credit Card" are far less enlightening than their full names (as stored in an address book or "Contacts" would be).

Encouraging the use of such shortened names not only minimizes the space required on screen, it also reduces data leakage that can occur by someone else looking at, or photographing the phone's display.

The term "decoration" refers to any supplementary shape, text or image associated with a display element. Typical examples are the small red circles containing a number that are added to the corner of an icon to indicate how many messages are waiting.

The term "display attribute" refers to any visual aspect of how text or an image is displayed. This may include but is not limited to colour, size, shape, shading texture, opacity, font, font-weight, italicization, underlining, strikethrough, motion, vibration, skew, speed of movement, flashing, rotation, reflection, scaling or arbitrary transform; decoration and so forth.

The term "notification attribute" refers to any combination of display attributes, sounds and/or haptic (touch/vibration) signals.

Figure 1 shows an exemplary architecture within which the invention may be deployed on a smartphone, tablet computer, laptop, desktop computer or similar (1) whose user communicates with a plurality of counterparties via their devices (3, 4) -which may or may not be running this same application. Typically there are many different ways in which this communication occurs: via telephone, chat, email, text messaging, instant messaging and so on.

Note that the discussion below primarily addresses a corporate employee using device (1) and for whom the application provides their business phone number, via the corporate's MAP (2). The same architecture may be deployed by a company offering such services to members of the public. In this case, a subscriber or customer of this company can be provided with the same capabilities over their personal phone number.

This example network includes a "Mobile Access Point" (MAP) (2) as described in UK Patent Application GB1816697.5. Whilst many of the features that follow may be deployed with an application entirely running within the end user's device (1), the routing of calls via said MAP (2) rather than direct to counterparties (3, 4) with whom the user is communicating allows a number of additional benefits -namely the ability to analyse and act on the media streams to and from device (1) and to reduce the amount of information held on device (1).

Device (1) communicates with MAP (2) via one or more network paths. For example, via a cellular base station (10) and cellular (voice) network (7); via a 4G cellular data network into the internet (7); via public (12) or corporate (11) Wi-Fi.

Mobile Access Point (2) may be a server within the corporate network (5) or hosted in a public data centre or "in the cloud" accessed via the internet.

Note that the end user's device (1) has no need to be aware of the actual addresses of counterparties. So long as it can ask the MAP (2) to establish a connection to a specific counterparty (3), it merely needs to be able to identify counterparty (3) to the MAP allowing the MAP to handle the onward routing of the connection.

The dominant smartphone platforms offer integral telephone dialler and messaging apps that have changed little in the last decade. These typically keep separate logs of phones calls and messages despite the fact that the same phone number is used for both SMS messaging and actually speaking to people.

However, there is typically a single "Phone" app -which handles calls to and from all mobile phone numbers (possible where the phone has dual-or e-SIM capability). Where the multiple contracts (and hence phone numbers) are provided in order to let the user appear as multiple persona (for example, private individual on one number; company employee on the other) a clear separation of calls for each number is required.

It thus makes more sense to have one (messaging capable) "Phone" application instance per phone number present on the device than it does to have a single (voice only but all phone numbers) "Phone" and separate (message only but all phone numbers) "Messages" application.

Most users also have one or more E-mail accounts accessible via their phone (1). Although the phone's integral "Contacts" app will associate an email address and phone number(s) with an individual, the emails exchanged with that individual are usually only visible in the email app.

Unified Communications apps and instant messaging apps abound -providing further channels and associated identifiers for the people and business one interacts with. Knowing which route your friends, colleagues, business contacts or groups thereof are using at any one time is increasingly difficult -and changes frequently.

Many "Unified Communications" applications have been developed but most are overkill for the basic telephone calls and text messages that still make up a substantial portion of the interactions with those outside one's own company. This invention therefore remains "phone centric" but allows gradual integration of other communications channels as the user becomes familiar with the application.

Rather than a separate app for each of the plethora of communications services, this invention can provide a single "hub" a pp that brings together as many as possible of the interaction channels to show a combined interaction history for each counterparty. This typically replaces the "Phone", "Contacts" and "Messages" functions which are normally considered as separate apps.

Where a single smartphone or tablet is used for both business and personal communications, it is increasingly important to keep data from the two domains separate even if (thanks to the use of the MAP) the phone is reachable via more than one public and/or private phone number. This is achieved by having two (or more) instances of the app on the device and, preferably having two separate telephone numbers -one (often private, internal rather than publicly dialable ("DDI" or "DID") business number, owned by the employer and the other, personal number, owned by the individual. Each instance interacts with a specific MAP and presents the user as having a particular telephone number, keeping the interactions it enables separate from those on other instances.

The app on user device (1) operates as an extremely "thin" client with as much as possible of the business intelligence, routing, call control and -most importantly -personally identifiable information stored in a secure server -the "Mobile Access Point" or MAP (2) rather than on the mobile device (1) itself. Doing so allows businesses to deploy this app without requiring an MDM platform to be installed.

Preferably, the interaction between the app and the MAP is restricted to a single data stream, using a single Unreliable Datagram Protocol (UDP) socket for signalling, administrative and real-time communications. However, separate channels for these may also be used. For example, signalling may be via a Session Initiation Protocol (SIP) channel while audio is carried over Realtime Transport Protocol (RTP).

When used in a personal capacity, some or all of the functionality of the MAP (2) may be running on the mobile device itself or the individual may subscribe to a publicly available MAP service. This affords the individual many of the same features and data security that business achieve using the MAP approach. For example, minimal data loss and exposure should the mobile device be lost, stolen or compromised.

Note that although this specification refers to "mobile devices", the a pp can be run on a wide range of platforms including but not limited to smartphones, tablet computers, laptops, smart TVs, desktop computers, browsers on any device and so on. The app may run "native" or via a browser or cross-platform tool. If the device on which it runs does not have an integral telephone capability or it needs to use a telephone number other than the one (or more) by which the device identifies itself on the public telephone network, this is achieved via the MAP (2).

Figure 2 shows an exemplary "home" screen from the app -such as would be presented immediately upon accessing or opening the app.

This app is likely to be the most used app on the device and hence very likely to be permanently "pinned" to the bar of most commonly used apps at the bottom of screen.

The most commonly used controls are therefore also placed at the bottom of the app's home page -within easy reach of the (typically) thumb that just opened the app.

The top bar allows access to user preferences (201) and settings (202) when needed. The body of the page is broken into three regions.

At the bottom, always in the same position and easily reached with the same finger that just selected the app (assuming the app is pinned to the bottom bar as is usually the case), are controls that are used vey frequently.

If the user knows all or part of the number they wish to dial, touching the Dialpad icon (203) brings up the numeric telephone dial-pad. It also shows the numbers most recently dialled. As the user dials digits, recently dialled numbers that match and entries in the app's Contact list that match are shown -allowing the user to select one of those rather than continue dialling the full number.

Increasingly, however, users are identifying who they want to contact by name rather than telephone number. Touching the search box (204) brings up an alphanumeric keypad (and the search box (204) automatically moves up above it to avoid being obscured by it) allowing the user to start entering all or part of the desired counterparty's name.

Note the microphone icon (205) present in the search box (204). Touching this enables speech recognition capabilities allowing the user to speak the name of the counterparty they are looking for.

Note that text entry and speech may be combined -for example typing enough characters to identify a subset of matching counterparties against which a spoken utterance is then matched (e.g. typing "eh" brings up several "Chris"s and you then touch the microphone icon (205) and say "Blair" to pick one with that surname).

The bulk of the home page is split into two parts -and the division between them may be varied -for example by dragging divider bar (206) up or down. Each of the two regions is, itself, scrollable, holding more information than is visible in Figure 2.

The lower section (207), which is most easily reached with the same finger that (typically) just touched the bottom of the screen to open the app, shows the user's "Favourite" counterparties (hereafter the "favourites pane"). This contains a number of touchable areas (hereafter "chips") labelled with the nickname or name of a counterparty (individual or group). There is also a "voicemail" chip (209) -preferably in a fixed position. The others represent individuals or groups of people, businesses or apps with whom the user most frequently interacts.

The shape, colour, texture, decoration and/or other visual attributes of each chip may be used to indicate specific attributes of that individual. For example, the colour may indicate the state of the most recent interaction with that party (sent a message, received but not read, read, deleted...) while the shape of the button may indicate whether or not this user initiated the contact; the size of the button may indicate its age; the border colour may indicated colleagues/customers/suppliers; a superimposed number may indicate the number of unread messages or missed calls -and so on.

Preferably, users are gradually introduced to additional attribute mappings and are able to select which display attribute(s) are used to convey the values of which attributes of the interactions.

With the exception of the voicemail button (209), each button in this region (207) actually has at least four touch functions.

1. Tap in the centre to bring up that counterparty's contact history (as shown in Figure 3).

2. (Optionally) Tap at the left to initiate a real-time connection to that counterpartyand bring up their contact history.

3. (Optionally) Tap at the right to start composing a message to that counterparty -and bring up their contact history.

4. Long press (or heavy pressure where available) to bring up a dialog allowing you to change the definition of the favourites button. This may include: a. Removing the button from favourites.

b. Modifying the nickname shown on the button.

c. Changing the preferred means of real-time and/or messaging interactions with that counterparty.

d. Adding that individual to or removing them from group(s) of which they are or should become a member.

e. Block contact from this counterparty.

f. Report counterparty for junk/nuisance calls.

Note the vertical bar (210). In this diagram the user is using English language settings so the chips flow left to right and then flow down the pane. So chips to the left of and above the bar (210) are "before" it while those to the right or below the bar (210) are "after" it.

Chips before the bar (210) are proactively placed there by the user and stay in the position assigned. The user can rearrange the order of these chips by dragging and dropping them within the favourites pane (207) -to bring the ones they need most often into easiest reach; to organise them (for example, bringing colleagues, customers and suppliers into contiguous regions.

Chips after the bar (210) are dynamically generated by the app according to the recent communications history. These represent counterparties that the user has interacted with most in the recent past -but who are not yet "pinned" to the left of the bar (210). The duration of "recent past" is preferably a user preference setting.

These chips are ordered by frequency and/or nature of contact (may be weighted according to how recently each contact was). For example, those nearest the bar (210) may be counterparties whose most recent incoming calls have been missed. The following chips may then be ordered by "density" of contacts. Note that the upper portion (208) of the screen shows interactions in chronological order-hence the most recent interactions are visible there. These dynamic buttons are therefore preferably assigned alternative ranking criteria.

By dragging divider bar (206) up or down, the user expands or contracts this "dynamics" region. Any available space is automatically filled with more counterparties. Dragging it up includes another row or two -showing those contacted less frequently or recently than the ones already showing.

The user may drag a chip currently positioned after the bar (210) to a position before the bar (210). Doing so implicitly indicates to the app that this counterparty should now be treated as a "favourite". A dialog is then shown which encourages the user to select a nickname for this counterparty-suggesting common combinations (initials, first name only, first name + initial of surname and so forth). This not only reduces the space needed for the label on the chip, it reduces the data leakage from any screenshot or over-the-shoulder sight of the app.

Use of shortened names is further encouraged by restricting the maximum width of these chips -truncating the name if the user has not already provided a shortened nickname that will fit within the available space. If the user wants permanent, instant, one-touch access to this counterparty then they must accept that limited information can be permanently visible on screen.

Preferably, the application will not allow the user to label a favourites chip with the counterparty's full name (surname and first name in either order, with or without spacing/separator character(s)).

The user is also prompted to choose or confirm the default real-time and messaging mechanisms by which they will contact the counterparty. These become the tap actions for the left and right-hand ends respectively of the resultant favourites chip.

The "+" button (211) allows the user to add a favourite from the overall Contacts list even if they are not shown in the dynamics region.

The remainder of the home page (the "interactions" pane) (208) shows a primarily chronological record of this user's interactions with others. As with the favourites area (207), counterparties are labelled by nickname where one has been assigned.

The user can drag a row from the upper portion of the screen (208) into the favourites area to add a chip for that counterparty to their favourites pane.

User preferences (typically set on a slide-in drawer accessed from the top left button (201)) determine: * Whether the most recent interaction is at the top or bottom of this "Interactions" pane (208).

* Whether any "pinned" counterparty entries are at the top or bottom of the pane (208).

* Whether counterparties appear only once in the list (according to the time of more recent interaction) or in the chronological list every time an interaction with them occurred.

* Whether interactions from each "channel" (phone, SMS, voicemail, Email...) are shown or not.

* Whether all interactions in a channel are shown or just those in specific states -for example: undelivered, missed.

Each entry in this pane (208) represents the interactions between the user and a specific counterparty (individual or group). Note that it does NOT show the counterparty's phone number or contact details as this (frequently used) screen is often visible to onlookers. It does show: 1. The nickname (where available) of the counterparty, otherwise their full name.

2. Time or relative time of last contact. Deliberately only shows time of day (not today's date) for yesterday's calls. Only if you scroll down to much older entries will you find fully described dates (and even those will not include the year until looking at earlier than "Last Year"). This ensures that any screenshot or photograph of the screen has as little value as possible if leaked.

3. The left-most icon (212) shows the preferred real-time communications channel (most commonly the telephone). The display attributes tell the user about the most recent real-time interaction with this user. For example, a red phone icon may indicate that the counterparty called but the user missed their call; grey translucent icon may indicate that they have not used this channel yet.

4. If the counterparty has left any messages via the real-time channel (such as a voicemail for example) an icon (213) showing that real-time channel's messaging service is shown. The display attributes tell the user about the most recent message. For example red or not according to whether or not they have already read it; size indicating how recent it is; number of unread messages in a small circle at one corner of the icon.

5. Right-most icon (214) shows the preferred messaging communications channel. The display attributes tell the user about the most recent messaging interaction with this user. For example, a red message bubble may indicate that there is an unread message. Note that the relative size of this icon (212) and the two at the left (212, 213) can indicate which was the most recent.

6. Note the orientation of the phone (212) and message (214) icons. Traditionally, the integral telephone app on both major platforms has used a handset icon oriented North-west to South-east. This is therefore used as the "outgoing" or "I used that app" icon while one reflected so as to be oriented North-east to South-west or (as shown here, rotated 180 degrees) is used to indicate a call placed by the counterparty TO me. The same applies to the message icon (214) -shown here with the tail bottom left (message from me) or top-right (message to me).

7. A graphical "thumbnail" (215) of the recent interactions with the counterparty is shown beneath their name (or, where available, nickname). This represents a chronological timeline of interactions using the icons associated with the various channels used to identify each contact. Their display attributes show, for example, who called whom. Spacing symbols to show intervals of time (dot per hour during today, thereafter thin vertical bar for day boundary, thicker vertical line for week boundary and so on). To make it clear that the leftmost icon is most recent, the size of the icons decreases for older contacts shown to the right of the most recent one.

8. Optionally, a brief (typically one line truncated with "....") summary of the content of the last message exchanged may be shown (not shown in Figure 2).

Actions available by interacting with this pane (touching or mouse-click for example) include: 1. Tap or click the icon (212) on the left to initiate a real-time interaction with this counterparty via the preferred real-time channel (or most recently used if no preferred channel yet established).

2. Tap or click the real-time channel messaging icon (213) to access the most recent message left and also see it in the context of that counterparty's interaction history (Figure 3).

3. Tap or click the icon on the right (214) to start composing a message to this counterparty via the preferred messaging channel (or most recently used if no preferred channel yet established).

4. Tap or click the voicemail icon (213) to view voicemail messages left by this counterparty and (optionally) automatically start playing the most recent.

5. Tap anywhere else on the entry to drill into that counterparty (see Figure 3).

6. Swipe left to remove the entry from the pane. Used on any interaction that does not require further action.

7. Long press (or heavy pressure where available) to bring up a dialog allowing you to: a. Pin this counterparty's entry to the interactions pane (208).

b. Add counterparty to favourites pane (207).

c. Block contact from this counterparty.

d. Report counterparty for junk/nuisance calls.

e. Block further incoming communications from this counterparty f. Report this counterparty for cold-calling; abusive calls etc. Note that messaging (214) and real-time channel controls (212) are deliberately kept as far apart as possible to avoid accidentally phoning someone when you only meant to message them.

Figure 3 shows the Contact History screen for an exemplary counterparty -whose nickname (301) is shown in the top bar (302). No unnecessary contact details or personal information are shown. If these are required, the information button (303) pulls up the full Contact details entry showing the usual number, address and preference details and the ability to add, edit and delete these. User preferences relating to this screen are accessed and modified via menu button (304).

Each of the interactions shown as an icon in "narrative" image (215) in the interactions pane (208) of the home screen (Figure 2) would typically be expanded here as a separate item in the chronological list.

Where the counterparty's time-zone and/or work hours are explicitly set or can be inferred (for example, from their telephone number's country code) the time in that region is shown (305). Preferably the display attributes of this time indicator (and/or associated icon (306) show whether this is in normal business hours; outside business hours or an anti-social time (such as 3AM). Preferably this timestamp and warning are shown close to the button (307) that, if pressed will attempt a real-time connection to the counterparty.

Alternatively, a pop-up dialog alerting the user to this anti-social time may be presented and the user given the option to cancel. This latter option should apply to any other buttons on screen that also trigger a real-time connection attempt (308 for example). Alternative options of leaving a voice message, sending a text or email may be offered instead. Public holidays may also be taken into account in this process -ensuring people are not disturbed unnecessarily.

Buttons (307, 309) at the bottom of the screen allow new contact with this party to be initiated via any of the channels through which they can be contacted.

Options on the user preferences dialog, accessed by touching menu button (304) allow: 1. The entries to be filtered to just those from specific channels 2. The entries to be filtered by state. For example, just show unread or missed calls; urgent or "flagged" messages only.

3. How far back to keep contacts 4. Shows which groups this individual is part of and allows them to be added to other existing groups or create a new group with them in and then add others.

Each real-time interaction or message shown in the body of this pane shows some or all of: 1. Time since interaction and/or date/time of interaction.

2. Direction of the interaction (via text, icon or nominated display attribute) 3. Status of the interaction (via text, icon or nominated display attribute) 4. Duration or size of the interaction (real-time and messaging respectively).

5. Time taken to answer real-time interactions.

6. Summary of content or full content if possible of a messaging interaction. Typically just the Subject line of emails; first line or two of a text. Keywords identified from automatic speech recognition and/or natural language processing.

7. Whether or not a recording exists of the real-time interaction and, if so, play (from start) button and audio waveform allowing replay from any position.

8. If speech recognition has been applied during the call or to a recording of it, any keywords or icons representing them.

Actions available by interacting with an individual contact on this pane (touching or mouse-click for example) include: 1. Tap to expand the content in-situ where possible or to open the content within the appropriate app to read/view/play the message or (if available) recording of the interaction.

2. Icon to initiate a call with or compose a message of the same time to this counterparty. Preferably (not shown in Figure 3) icons resulting in real-time interactions being initiated are placed as far as possible from those creating messages. This helps avoid disturbing counterparties by accident when a message was the intended touch action.

3. Swipe left to delete interaction.

4. Long press (or heavy press where available) to bring up a dialog allowing: a. Pin this message to top or bottom of pane (out of chronological order) b. Flag this message as important c. Send this message for transcription, further processing.

d. Send details of this interaction to other application (such as a CRM application).

e. Forward, Reply or Reply All f. Escalate to management -sending contact history and recordings if available.

The above features significantly increase the power-and hence, inevitably, the complexity of the user interface beyond the simple "dialler" and messaging applications typically present on a smartphone. Preferably, therefore, these advanced features may be initially suppressed and only introduced one at a time as the user becomes familiar with the app. "Tip" notifications may suggest that the user tries adding a further feature every so often.

Privacy concerns -and associated regulations such as the EU's GDPR -dictate that the amount of personally identifiable information present on screen and within the (vulnerable) end user device should be minimized.

The data structures that achieve this goal are shown in Figure 4.

The end user device (1), has a globally unique DevicelD (405) associated with the hardware it is running on and a globally unique application ID (406) representing this instance of the application (for example, GUID created on first running the application). These are communicated to the MAP (2) allowing the device (1) to store as little as possible -namely: 1. A local party identifier and nickname (401) by which the user is able to recognize each counterparty to which it refers -but which is of little use to a thief without access to the mappings (402) between these local IDs, the device's ID (405) and the Application instance ID (406) and the contact's globally unique identifier. The full contact details (407) identified by the global identifier are stored securely, remote from the (vulnerable) end user device. The latter a 2. For each message that is needs to show to the user (a small subset of the total) a local message ID and minimal information such as would be shown when the message is presented in a scrollable list (date/time, channel via which the message or call flowed) and a short summary. As with contact details, this information is of little value without the mapping table (404) that associated it with a specific party and content.

3. Full details of a contact are only provided to the end user device (1) if the user explicitly requests them and are not persisted there.

4. Full details of a message are not provided to the end user device (1) unless the user explicitly views the detail of a message -and are not persisted there.

Note that although the data is shown inside MAP (2), it is typically persisted in a database remote from the MAP and only a subset is held in memory within the MAP at any given time. The important point is that the information present on the end user's device (1) is pseudonymized with no easy route back to the actual content or message detail without access to the separately stored and tightly secured mapping tables (402, 404) and the underlying data (407, 408). For added security, the mappings (402, 404) may be stored and secured in a separate database from the contact and message information (407, 408).

The app running on User Device (1) therefore minimizes data leakage and potential loss as follows: 1. Displays nicknames rather than full names where practical.

2. Displays partial timestamps not full date/time combinations where practical.

3. Displays icons for usable communications channels but only shows the counterparty's address on each when explicitly accessed.

4. The MAP assigns a locally unique identifier to each counterparty that this individual interacts with via device (1). These could be GUIDs but, to reduce space and bandwidth requirements, given that each device is uniquely identifiable (normally through its hardware identity (405)) and each instance of the app registers uniquely with the MAP using a GUID (406), it can simply be a sequential (or, preferably random) integer. Only the MAP knows, for example, that this app's counterparty 1745 is actually individual John Doe.

5. Assigns a locally unique identifier by which this, and only this instance of the app refers to a specific interaction with a counterparty. This, in combination with the unique counterparty ID makes it harder for data seized from multiple instances of the app to be cross referenced.

6. Communication with the MAP therefore references the app's local counterparty reference rather than the underlying identity.

7. When showing potential matches to a Contact lookup, those counterparties that have not yet communicated with this instance of the app can be assigned a temporary ID which is released if they are not selected or are no longer included in the search results. Only on actually contacting or attempting to contact the counterparty is a local identifier assigned permanently.

8. The mapping of counterparty attributes to chip/contact history display attributes is not stored in the phone or communicated with it except when a new attribute is being assigned. For example, chips appearing as square buttons may indicate that the counterparty is a customer -but once defined, this is never again explicit in the data passing between device and MAP and is not even stored on the device. Hence loss of the device, even if the data and source code within the app are accessed tells the hacker no more than "counterparty with nickname XY appears in square buttons" -which they could have seen by looking over your shoulder.

9. Local party and message IDs may be reused rapidly. They only need to be unique within the device (1) so if, for example, a set of messages is shown and then deleted, the local IDs used for those can be immediately reused for the next batch of messages. This further obfuscates the true identity of the messages. The same goes for counterparties. For example, 20 possible matches to a search are temporarily assigned local IDs but when that search completes, all the unused ones are destroyed allowing their IDs to be reused on the next search.

10. Local IDs can actually be assigned and persisted in the MAP. An instruction from the app that a user wishes to "pin" a particular counterparty to their favourites pane may signal that a particular ID should now persist rather than potentially be recycled within the current session.

11. Local IDs may be cleared and reset to new, random values on successive uses of the application. For example, the initial handshake between app and MAP may include a refresh of some or all local ID data. Thus any IDs stolen are of very little use as they have a very short lifespan as well as very little information associated with them.

12. The rate at which data can be extracted from the MAP may be throttled to prevent malicious code pretending to be an app (should someone crack the encryption and protocol by which a user would look up one or more contacts for example. The MAP may respond with a "Proof of Work" puzzle before accepting further requests for information. The server (MAP) sends client (putative app on device) a puzzle (of variable complexity depending on how threatened the server is feeling -for example, may omit on first request, include on second within a set timespan). The client has to solve the puzzle and send the solution with its next message. The puzzle is a trap-door algorithm -easy to set, difficult to solve, easy to check answer (without having to maintain state). Note that this same mechanism can be used as part of the initial handshake to protect against Denial of Service attacks.

13. During contact searches, when more than one match is presented on partial entry of a name or address/number, the search results are removed from view after a much shorter inactivity time than the normal screen dimming or idle timeout. This is typically five seconds, which is long enough to take the next step in a search but quick enough to hide the results promptly if you abandon the search. On resuming the search, by focusing on the text entry field again, the previous results are shown allowing further refinement of the search and/or scrolling through the results. This reduces the chances of the information being leaked by screengrabs or photographs of the phone. It also reduces the incidence of "pocket dialling" -since the contacts' names and details are not shown on screen for long, there is less chance that they are accidentally touched and dialled as the phone is placed in a pocket or randomly touched.

A key security concern is that it is relatively easy to "spoofs' any calling line identification especially on a PSTN phone. So a call may not actually be from who it appears to be from: your bank, for example. It is also relatively easy to fool people into checking your identity via a telephone number that is itself fake, or intercepted somehow.

Furthermore, this application makes telephony available via mobile phones that can be equipped with this application and hence appear to be calls from a legitimate business number.

It is therefore beneficial that the system: 1. Ensures that calls made via the system are made by the authorized individual.

2. Provides a means of verifying that the counterparty is who they purport to be.

Adding multi-factor authentication -whereby confirmation codes are exchanged via another channel, for example -is cumbersome and not practical for many phone calls. There is therefore a need to enhance the security of these real-time communications channels -particularly the public switched telephone network but also online streaming audio where it is much easier to hide your true identity behind some stolen images and a voice-only channel than it would be if you had to show your face on video.

By routing all voice calls via the MAP (2), this invention allows the easy and secure deployment of speaker authentication algorithms -making them immediately and transparently available for the analysis of either or both sides of any voice call. Such algorithms are already widely used -for example, in telephone banking lines where a few seconds of audio at the start of the (typically unscripted and arbitrary) conversation is sufficient to produce a parameterized model of that speaker's voice suitable for corn parison with a previously enrolled sample of a positively confirmed individual.

This has two primary use cases: 1. Ensuring that only the appropriately authorised employee is able to make or take calls on a specific end user device (1) over a particular business owned phone number.

2. Letting an individual receiving a call on a device (1) verify that the call is over a genuine business line owned by the company they are led to believe it is from and that the person they are speaking to is the appropriately authorised person entitled to be calling over that line.

The methods employed in these two cases are detailed below.

For case 1 above, an initial enrolment procedure requires that employees use a secure corporate website form where they provide the personal phone number (if any) that they will be using to place calls appearing to come from their business number.

They are then provided with a unique QR Code (as this is more secure than having any human readable configuration info) and details of where to download the app. The app will preferably NOT be downloaded via public app stores -only by individual invitation through corporate route so as to allow embedding of company specific information and reduce the chance of this being obtained by non-employees. Accessing this download point will preferably require the employee to be signed in to their corporate active directory or Windows domain account.

On first running the app: 1. The employee must provide necessary permissions (microphone, camera, [location]). Specifically, this must include access to the incoming SMS messages received by the phone.

2. The employee is prompted to point the device's camera at their QR Code.

3. The app interacts with the MAP (2) identified in the QR code -to validate the QR code (identify employee, cell number and that it has not be revoked).

4. An activation code is sent to the cell-phone number associated with the employee.

5. If this is, indeed, the cell-phone that it is purported to be, the incoming text message will be received by the app and the embedded security code accepted if received within the (very short) time window allowed. This time window is such that it would be very difficult for someone to transcribe the code from another device. Preferably said code include invisible characters and/or non-standard characters (such as obscure emojis) to further hamper such efforts).

6. Employee is instructed to speak and repeat several phrases or read a paragraph as required for enrolment with the voice verification system.

Whenever the app is then used by an employee to make or take real-time calls that include a voice element (telephone, video conference, live-streaming) or voice messaging applications such as leaving a voicemail, their speech -which passes via MAP (2) -may be analysed on-the-fly and compared against the sample obtained when they enrolled with the application or provided via their employer.

Where the comparison does not provide a strong match, a check may be made by sending the employee an email asking them to confirm that this was indeed them -and, if not, block their number from making further calls. A strong rejection, on the other hand (very likely not the authorised speaker) leads to the number being blocked immediately until reinstated manually.

For the other use case above, where a user of the application receives a call from someone they do not know personally or whose voice they do not recognize for sure, they can also use the service to take a voiceprint of the counterparty. This may require the counterparty to be notified and/or give consent -according to the appropriate local, national or supranational regulations. The application can assist with this -for example, by playing spoken disclaimers, explanations and capturing consent via a recording. Alternatively, and preferentially, verifying the identity of a caller-say, purporting to be John Doe from BigCorp-can be verified by emailing Kthitdoeithi.1 with the explanation and legal terms -and a random security code. If the counterparty is then able to recite said security code, this proves they are able to receive emails sent to that address.

However, for added security, BigCorp can subscribe to this invention's publicly available "Voice Verification as a Service" (VVaaS) offering as described below.

A highly secure exchange of information between the VVaaS provider and each subscribing company results in a trusted, secure communications channel being permanently established between the two.

The subscribing company maintains a live list of current employees -for each of whom is stored: 1. Business E-mail address 2. Business phone number -of a phone running the app in this invention 3. a code obtained from the VVaaS uniquely identifying their phone handset (and not visible to anyone in the organization, including the employee being enrolled).

4. (optional) schedule of hours/days this employee will be verified.

5. (optional) geofence of location(s) within which this employee will be verified.

6. Security level: whether voice-print only is sufficient or whether automatic verification via email (or other routes) is required.

7. Who/how to alert on failed verification attempts.

8. Expiry date -before which the entry must be refreshed or will be invalidated.

On adding an employee to this list, the VVaaS provides a one-time access code, with short lifetime, preferably in non-human-readable form (such as a unique QR code).

The employee contacts the VVaaS via the public switched network and scans said QR code into the application.

The combination of unused, unexpired, not revoked QR code being received via the expected business telephone number allows enrolment to begin. The user may be asked to repeat some specific phrases to generate their initial "reference" voiceprint which is stored for future comparison.

In parallel, the VVaaS sends an email to the corresponding Email address. This contains a further, very short-lived QR code that the employee is instructed to scan immediately in order to complete the enrolment process. This ensures that the user has current access to the email account, not just a photograph of the original QR code taken from the real employee's screen.

Calls received by individual users of the app that appear to come from phone numbers thus registered with the VVaaS are flagged as such -for example, by a distinctive ring-tone allowing them to be easily differentiated from cold calls from unknown and untraceable sources.

The VVaaS constructs a verification request meaning "does this voiceprint VVVV match that of the authorized employee for your phone number XXX and are they currently on a call to phone number or other address YYY?".

The above database may be held by the VVaaS and updated by the subscribing company over said secure communications channel. In this case, the verification request is handled within the VVaaS by querying the database for the stored voiceprint associated with the business number and an API call to the subscribing company merely queries whether business number XXX is currently calling YYY.

Alternatively, the data may be held by the subscribing company and the full API call with parameters VVVV, X)0( and YYY passed over said secure communications channel to obtain the answer.

Note that comparison of voiceprints results in a confidence level of how likely it is that the two originated from the same speaker. This can be represented on a continuous numerical scale, the ends of which represent "very likely match" and "very unlikely match".

However, the presence or absence of a call between XXX and YYY at the present time is a Boolean result. Combining this with the voiceprint match allows a high degree of confidence that the speaker is who they claim to be even if the voiceprint match is not as strong as would be required to allow a user to access their bank account in the absence of other confirmatory factors.

Fraudsters may attempt to fool such a system by using a recorded fragment of the authorised speaker's voice during the call -especially the initial greeting during which it may be expected that any analysis will be performed.

There is also increasing use of pre-recorded greetings and fragments of speech recorded previously by the call centre agent and played on command to allow them to do other work while the recording is being played. It is important that the system identifies that the speech it hears is not all from speaker -as would be the case if someone fraudulently logged in to a genuine call centre agent's desktop application. They would be able to place calls appearing to come from said agent and play these recordings -but would still have to use their own voice when interacting with the customer.

Also, it is common to have music-on-hold or other pre-recorded announcements played while the call is placed on hold or is queuing for resources -such as during a transfer. Background noise during pauses could also trigger false negatives if a nearby speaker is detected and determined not to be the purported caller.

These issues can be countered by, for example: 1. Repeating the analysis at (preferably random) intervals throughout the call.

2. Waiting till the second turn of speech before analysing that of the caller (in other words, ignore the first contiguous utterance from the caller as this could well be a pre-recorded greeting or announcement. Wait until both parties have spoken at least once, then analyse the subsequent speech.

3. Using speaker separation algorithms (as widely used in speech recognition applications for tagging who is speaking on auto-generated subtitles for example) to identify changes to the speaker and retesting the new speaker's voice.

4. Identifying exact repetitions of sections of audio that have been heard before and not using those fragments to verify the live individual on the call.

5. Subscribing companies may provide copies of the recorded announcements (or instructions of how to call their call centre in such a way as to hear such an announcement). These are used in advance to generate reference voiceprints for the (typically small number of) announcers whose voices are used in these. These can then be compared against the current speech in a call and, rather than flag a "strong rejection" (since this speaker is definitely not the individual that is purported to be calling) they actually proved a slightly increased degree of confidence that the call is from the source it purports to be.

6. Music can be detected and excluded by various means. For example, monitoring the confidence levels of continuous speech transcription output will show a significant drop in recognition confidence during music. Alternatively, analysis of the frequencies present will result in voiceprints that lie outside the scope of those that can be generated from human speech alone.

7. Repeated fragments of audio -such as music on hold, announcements and prerecorded fragments of conversation can be detected and excluded from verification matching. This can be done, for example, by summarising the energy (amplitude squared) in a finite time window -generating an "energy envelope" pattern that can be compared against known common patterns using a sliding window to identify a match where there is high correlation. Typically a moving window automatic gain control algorithm is also employed to normalise the level of the reference and sample energy envelopes prior to comparison.

8. A "squelch" level can be applied. Following the first utterances from each side, the amplitude range of each direction of audio can be ascertained. Subsequent utterances may be ignored if these are suddenly significantly quieter than the previous interaction.

9. Assuming there are (short) gaps between words, the minimum amplitude levels can be measured. This gives a signal to noise ratio which can be used to modify the threshold required for a voice verification match. For example, poorer signal to noise ratio means the voiceprint is unlikely to be as good a match as if it were taken from a cleaner signal.

10. The initial reference levels for amplitude and signal-to-noise ratio may be modified over time by calculation of a moving average to allow for gradual changes such as may occur when walking around a building.

11. CTI Integration with the subscribers telephony system may provide explicit information regarding the call that could influence the analysis of the speech that is being transmitted and/or received. In this case an application observes CTI events on the subscriber's telephony system and alerts the VVaaS via events each time a significant change occurs during the call. These may include but are not limited to: transfer to individual X; call on hold; call muted; announcement (preferably indicating which one) playing; conferenced in individual Z; recording state changed (often leads to tones being injected to indicate recording present or paused).

As audio is typically being transmitted and received in packets of 20 or 30ms duration, the preferred mechanism of energy envelope determination is simply to sum the squares of the audio amplitude. This gives a low bandwidth "summary" of the audio levels much as you would see on a typical user interface for an audio system -where the individual words and gaps between them appear as peaks and troughs respectively.

Note that an overall determination "NO" (this is unlikely to be who is claimed calling) does not expose any significant information about the validity or otherwise of the three independent input parameters to the query -VVVV, XXX or YYY.

These parameters of this request are so difficult to generate unless you're actually on a call to that person, that getting a positive response provides a very high degree of confidence that the call and caller are who they purport to be.

If the recipient of the call wants further reassurance, they can ask the counterparty their name and submit that for verification too. A further level of assurance is available for highly sensitive calls by requesting that a security code number or word(s) be sent via email to the registered user of that business number. This could be typed in by the end user or randomly generated by the VVaaS. On hearing the counterparty read that code or word(s) back before proceeding with the call, the user is assured that the counterparty is very likely who they claim to be.

Claims

CLAIMS1. A system consisting of an application running on a plurality of communications devices, each configured with the addresses of one or more mobile access points via which communication sessions containing at least one audio stream are established with one or more counterparty devices via one or more network connections and wherein a time-bounded sample of at least one of said audio streams is analysed so as to determine a set of characteristics of said audio stream and where said characteristics are compared against a previously measured reference set of characteristics in order to test the hypothesis that the person speaking in said audio stream is the same individual from whose speech said reference set of characteristics were obtained.
2. A system of claim 1 wherein said hypothesis is tested repeatedly throughout said communication session.
3. A system of claim 1 wherein said hypothesis is tested repeatedly at random intervals during said communication session.
4. A system of claim 1 wherein said audio streams are transmitted in discrete network packets and the level of audio transmitted in each direction of a stream in a finite period is measured as the sum of the audio amplitude squared throughout a single packet of audio and the time sequence of said audio levels is stored as a representation of the energy envelope of said audio stream.
5. A system of claim 1 wherein said hypothesis is tested after the energy envelope of each direction of audio exceeds pre-determined level and duration thresholds in each direction of said audio stream.
6. A system of claim 1 wherein said sample is selected from periods whose energy envelope does not correlate with any of those in a predetermined library of energy envelopes.
7. A system of claim 1 wherein said audio stream is also subject to continuous speech recognition analysis resulting in transcription, confidence levels and speaker enumeration outputs.
8. A system of claim 7 in which output indicating a change of speaker triggers a further test of said hypothesis.
9. A system of claim 1 wherein the result of said test falling outside a pre-determined range is used to terminate said communication session in the event of said hypothesis being strongly rejected.
10. A system of claim 1 in which said test results are considered in combination with the current state of the telephony system from which said individual is purported to be calling, said state being inferred from the receipt of computer telephony integration events.
11. A system of claim 10 in which said events include but are not limited to transfer; hold; muted; announcement playing; conferenced; recording state changed.