[go: up one dir, main page]

HK1232014A1 - Language modeling for speech recognition leveraging knowledge graph - Google Patents

Language modeling for speech recognition leveraging knowledge graph

Info

Publication number
HK1232014A1
HK1232014A1 HK17105636.0A HK17105636A HK1232014A1 HK 1232014 A1 HK1232014 A1 HK 1232014A1 HK 17105636 A HK17105636 A HK 17105636A HK 1232014 A1 HK1232014 A1 HK 1232014A1
Authority
HK
Hong Kong
Prior art keywords
language model
entity
named
knowledge graph
model
Prior art date
Application number
HK17105636.0A
Other languages
Chinese (zh)
Other versions
HK1232014A (en
Inventor
朱卫武
随明
彭圣才
Original Assignee
微软技术许可有限责任公司
Filing date
Publication date
Application filed by 微软技术许可有限责任公司 filed Critical 微软技术许可有限责任公司
Publication of HK1232014A publication Critical patent/HK1232014A/en
Publication of HK1232014A1 publication Critical patent/HK1232014A1/en

Links

Description

Language modeling for speech recognition using knowledge graphs
Technical Field
The present invention relates to language models, and more particularly, to language modeling for speech recognition using knowledge graphs.
Background
A statistical language model is a classical model designed for speech recognition to estimate the prior probabilities of word strings. In speech recognition systems, the applied language models generally work with very large, but limited, training corpora. While modern commercial speech recognition systems add as large and large a training corpus as possible, updating the training corpus to collect all possible entity information is difficult due to considerations such as training performance, model size, and the cost (financial and resource, etc.) for collecting and maintaining a training corpus containing named entities. However, named entities are often present in speech recognition systems, especially spoken dialog systems. Such entities are very likely to be identified as other common words with the same or similar pronunciation, because the named entities may never appear in a limited training corpus. The present application is directed to this general technical environment.
Disclosure of Invention
Non-limiting examples of the present disclosure describe language model processing of received utterances using a combined language model. A combined language model is applied to evaluate named entity data associated with the received utterance. The combined language model uses a location-based language model and an entity relationship probability model to evaluate named entity data associated with the received utterance. In at least one example, named entity data of a received utterance is evaluated using at least one entity knowledge graph and query click data associated with entities of the knowledge graph to generate a final probability that one or more transcriptions contain the named entity data. Outputting a result based on the combined language model calculated final probabilities, wherein the output result includes one or more transcriptions that rank probabilities of candidate transcriptions based on the combined language model.
Other non-limiting examples of the present disclosure describe generating a weighted combination language model for evaluating transcription probabilities associated with received inputs. A location-based language model is trained based on a knowledge graph utilized from named entity data and location information of query click data associated with named entities of the knowledge graph. An entity relationship probability model is trained based on entity relationship data from the knowledge graph and entity relationships extracted from the query click log data. A weighted combination language model for evaluating named entity data associated with the received input is generated based on the trained location-based language model and the trained entity-relationship probability model.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Additional aspects, features, and/or advantages of the various examples will be set forth in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
Non-limiting and non-exhaustive examples are described with reference to the following figures.
FIG. 1 illustrates an overview of an example system for language processing of received input described herein.
FIG. 2 illustrates an overview of an example flow process for generating a weighted combined language model described herein.
FIG. 3 illustrates an example method described herein for generating and applying a combined language model.
FIG. 4 illustrates an example method described herein for input processing using a combined language model.
Fig. 5 is a block diagram illustrating an example of a computing device that may be used to implement aspects of the present disclosure.
Fig. 6A and 6B are simplified block diagrams of mobile computing devices that may be used to implement aspects of the present disclosure.
FIG. 7 is a simplified block diagram of a distributed computing system in which aspects of the present disclosure may be implemented.
Detailed Description
Examples of the present disclosure describe utilizing a knowledge graph to enhance statistical language modeling for online speech recognition systems/services to more correctly identify named entities. The speech recognition system/service applies at least two models to evaluate the input, namely 1) an acoustic model, and 2) a language model. An input is any data provided to and processed by a processing device. While examples may describe the input as relating to a speech recognition process, one skilled in the art will recognize that the examples described herein are applicable to any type of input, including but not limited to: examples of speech/sound input, text input, gesture input, and handwriting input. The acoustic model is used to evaluate the pronunciation of the utterance. Acoustic model processing is used to model the received utterance. The output of the acoustic model is a lattice for each utterance. The lattice is an intermediate probability model in the form of a directed graph. The language model is used to calculate the probability of the entire transcription/entire sentence, not just the probability of the named entity. Language model processing is attempting to use context information to compile phones from candidate lattices into more meaningful word sequences. With the candidate lattice generated by the acoustic model, the language model is applied to generate the N best transcriptions (sentences of words with probabilities), e.g., 10 sentences with the highest probabilities. The candidate transcriptions are ranked, and the output of the transcription with the highest ranking (e.g., from the N-best transcriptions) is generated as the final transcription. The exemplary combinatorial language model of the present disclosure is designed to increase the probability of a transcription containing the correct named entity. The output of the exemplary combined language model may determine the transcription with the correct named entity and output the first transcription with the highest probability in place of the transcription containing the other wrong common words with the same or similar pronunciation.
Users speak named entities very frequently to speech recognition systems, in particular spoken dialog systems. Named entities are classified elements of data for identification. For example, a named entity is a term or phrase that identifies an element from a set of other elements based on a property/attribute. Named entity data is any data associated with a named entity, such as data managed by a knowledge graph and/or query click log, and the like. By way of example, a named entity may be categorized into data including, but not limited to: name, organization, geographic location, age, characteristics of people/places/things, address, phone number, email address, time representation, quantity, monetary value, percentage, and the like. The present disclosure relates to improving language model processing by speech recognition systems/services. As an example, assume that a user speaks a named entity "Weowna park" into the speech recognition system. A large but limited training corpus lacking this named entity data may identify this named entity as "We gonna park". Examples of the present disclosure improve the language model processing of speech recognition to correctly recognize a named entity, rather than recognizing the named entity data as some other common word having the same or similar pronunciation.
In examples described herein, properties of billions of named entities from knowledge graph data and query click logs (e.g., search engine query click data) and entity relationships are utilized to improve statistical language modeling to identify unknown named entities and to reduce Word Error Rates (WER) for speech recognition. Examples of the present disclosure provide more benefits than merely adding new entities to a language model. In evaluating an input named entity, merely adding the named entity to improve a training corpus of a language model results in a probability of 0 for a transition n-gram between the entity and adjacent words surrounding the named entity. Examples of the present disclosure use extended capabilities of knowledge graphs to extract and evaluate attributes and relationships of named entities to improve language model processing. Accordingly, the present disclosure provides a number of technical effects, including, but not limited to: enhanced language models for input processing, including improved robustness in training corpora for language model processing; improved accuracy in speech recognition processing, including reduced WER and improved named entity detection; identifying efficiency and availability of applications/services based on the improved input of utilization of the knowledge graph data and the query click log data; the processing load of the voice recognition system/service is reduced; and control of user interaction for input recognition processing, and the like.
FIG. 1 illustrates an overview of an example system 100 for language processing of received input described herein. The exemplary system 100 presented is a combination of interdependent components that interact to form an integrated whole for generating learned programs based on user example operations. The components of system 100 may be hardware components or software implemented on and/or executed by hardware components of system 100. In examples, system 100 may include any of hardware components (e.g., ASICs, other devices for executing/running an Operating System (OS)) and software components running on hardware (e.g., applications, application programming interfaces, modules, virtual machines, runtime libraries, etc.). In one example, the exemplary system 100 can provide an environment for software components to run, adhere to a set of constraints for operation, and utilize resources or tools of the system 100, where the components can be software (e.g., applications, programs, modules, etc.) running on one or more processing devices. For example, software (e.g., applications, operating instructions, modules, etc.) may be running on a processing device such as a computer, mobile device (e.g., smartphone/phone, tablet), and/or any other electronic device. As an example of a processing device operating environment, reference is made to the operating environments of fig. 5-7. In other examples, the components of the systems disclosed herein may be spread across multiple devices. For example, input may be input on a client device (e.g., a processing device), while information may be processed or accessed by other devices in a network, such as one or more server devices.
As one example, the system 100 includes a location-based language processing component 102, an entity relationship probability processing component 104, a robust language model processing component 106, and a combined language model processing component 108, each having one or more additional components. Those skilled in the art will appreciate that the scale of a system, such as system 100, may vary and may include more or fewer components than those depicted in fig. 1. In some examples, interfacing between components of system 100 may be done remotely, such as where components of system 100 may be dispersed among one or more devices of a distributed network. In various examples, one or more data stores/storages or other memories are associated with system 100. For example, a component of system 100 may have one or more data stores associated therewith. Data associated with the components of system 100 and processing operations/instructions performed by the components of system 100 may be stored on the data store. The system 100 describes a component that utilizes data from a knowledge graph and query click data to improve entity identification. In examples, the knowledge graph and query click data may be persisted in a data store of the system 100.
A knowledge graph is a collection of billions of named entities, their attributes, and a rich set of relationships between them. Named entities and relationships are typically extracted from a number of resources available across web resources, other structured or semi-structured data sources, and Resource Description Frameworks (RDFs). RDF is an infrastructure that allows structured metadata to be encoded, exchanged, and reused. RDF is based on the idea of making statements about resources in the form of a host predicate expression. These expressions are known as triplets in RDF terminology. Typically, a triplet consists of two named entities and a relationship linking them together in a host-predicate expression, as seen from (Apple, Steve Jobs). In one example, a trient is a basic unit in a knowledge graph, which typically contains billions of trients and manages the trients in a database. In evaluating knowledge graphs, a unified ontology is applied to the knowledge graph to describe all of the different categories of metadata for named entities around the world in a complete and machine-readable form. For example, the person ontology contains attributes of name, date of birth, date of death, height, weight, spouse, child, and so on. As an example, a commonly used ontology agreed upon by academia and major research institutes is provided in the schema. Through unified ontologies and vast numbers of named entities and their relationships, knowledge graphs are able to cover and understand well the named entities worldwide, which motivates the present invention to leverage its ability to improve language models to perform better in terms of entity recognition.
Query click log data (e.g., search engine query click data) is a collection of data associated with clicked Uniform Resource Locators (URLs) of a search query. A query is data entered into a web search. Commercial search engines record billions of search queries each day. Query click data may be collected and managed to evaluate named entities associated with a query. In one example, queries in the query log have an associated set of URLs that are clicked on after a user enters a search query. In this example, the sentences of the query containing the named entity data (e.g., entity name) can be evaluated to construct a training corpus for the recognition process. The training corpus is a large structured dataset.
The location-based language processing component 102 is a language model processing component that evaluates location information associated with a processing device and one or more named entities. The location-based language processing component 102 builds a location-based language model that can be used to evaluate named entities. In doing so, the location-based language processing component 102 uses data from the knowledge graph and query click logs as described above. The location information estimated by the location-based language processing component 102 refers to a point or an area or other place on the surface of the earth. As an example, if a particular named entity (e.g., a restaurant) is not known, it may be difficult for a speech recognition system or application to identify the entity. Examples of the present disclosure recognize that an unrecognized named entity may be a local entity that has a strong relationship to the city in which the processing device is located. For example, a restaurant has an address attribute, a live person may be associated with the address, and an event may occur in a particular city, etc. Examples of the present disclosure utilize a knowledge graph to evaluate entity location information associated with processing device location detection to improve named entity identification. Upon detecting location information, the location-based language processing component 102 can interface with one or more additional components of the processing device. The detection of position information of a processing device is known to the person skilled in the art. The location-based language processing component 102 uses machine learning processing to build a location-based statistical language model. Exemplary processing by the location-based processing component 102 is further described in the description of FIG. 2.
The entity relationship probability processing component 104 is a language model processing component that evaluates relationships between named entities of a knowledge graph. The entity relationship probability processing component 104 generates probabilities that named entities of the knowledge graph are associated with named entities associated with the received input (e.g., spoken utterance). For example, if a query of "how valuable Microsoft word" is received, it is likely that the next query will resemble "how wealth Bill Gates have" (how much wealth there is in Belgium). The entity relationship probability processing component 104 can be employed to evaluate relationships between named entities and queries. The modeling performed by the entity relationship probability processing component 104 is used to improve the entity identification process. In examples, the entity relationship probability processing component 104 uses a knowledge graph to evaluate the context of a named entity in one or more utterances to determine the connections/relationships between the utterances to improve the probability of evaluating the correctness of the relationships between the inputs (e.g., utterances). The entity relationship probability processing component 104 uses a machine learning process to build a probability model for entity relationships. Exemplary processing by the entity relationship probability processing component 104 is further described in the description of FIG. 2.
The robust language model processing component 106 is a language model processing component that generates statistical languages trained from information of a very large training corpus to improve the input recognition process. The training corpus managed by the robust language model processing component 106 can contain data that includes all of the words of the entities in the knowledge graph of the system 100. In examples, the processing by the robust language model processing component 106 assigns probabilities to word sequences by probability distributions to evaluate the received input. By way of example, the robust language model processing component 106 can be an n-gram model. An n-gram model is a type of probabilistic language model used to predict the next term in a sequence in the form of an (n-1) order Markov model. The robust language model processing component 106 evaluates the received input using a machine learning process. Exemplary processing by the robust language model processing component 106 is further described in the description of FIG. 2.
The combined language model processing component 108 is a component of the system 100 that combines two or more of the outputs from the location-based language processing component 102, the entity relationship probability processing component 104, and the robust language model processing component 106 to generate a combined language model. The combined language model is used to make a final decision on evaluating the received input. The final decision made by the combined language model processing component 108 is a final probability regarding the ranking of candidate transcriptions for interpreting the received input that takes into account the processing/modeling provided by the location-based language processing component 102, the entity relationship probability processing component 104, and the robust language model processing component 106. In examples, the combined language model processing component 108 can make a final determination based on a weighted evaluation of the processing provided by the location-based language processing component 102, the entity relationship probability processing component 104, and the robust language model processing component 106. As an example, the weighting parameters are applied to results generated by the location-based language processing component 102, the entity relationship probability processing component 104, and/or the robust language model processing component 106. The weighting parameters for the evaluation process by the components of the system 100 may be assigned (or updated) based on statistical modeling (e.g., linear and non-linear assignments), telemetry data, controlled or uncontrolled testing, and/or prior experience and judgment related to combining the different types of models as described above. The results may be output to the received input as a result of the final determination made by the combined language model processing component 108. As an example, the output is one or more ranked transcriptions with the highest probability from the N best transcriptions (e.g., for transcribing the received input sentence or word combination). The combined language model examples of the present disclosure increase the probability that a transcript contains the correct named entity (e.g., person, place, thing) expected by the user providing the received input. As an example, the output of the combined language model may rank or re-rank the transcriptions comprising the named entity data, and select one or more transcriptions with the highest probability as the final output. Application of the combined language model may evaluate transcription to minimize named entities that may contain other incorrect common words with similar pronunciations. In some examples, a dialog may be constructed with a user of a processing device in which more than one input is received. The components of the described system 100 can recognize associations between utterances and the combined language model processing component 108 can ultimately determine the entirety of the dialog based on the received input. The combined language model processing component 108 uses machine learning processing to make final decisions for the received input. As one example, the final decision may be used by the speech recognition system/service to evaluate the received input and output a response to the received input. As identified above, the output may be a result of processing by the combined language model, the output including one or more ranked transcriptions resulting from processing of the received input. Exemplary processing by the combined language model processing component 108 is further described in the description of FIG. 2.
FIG. 2 illustrates an overview of an example flow process 200 for generating a weighted combined language model described herein. By way of example, the process flow 200 may be performed by an exemplary system, such as the system 100 of FIG. 1. In various examples, the process flow 200 may be executed on a device including at least one processor configured to store and execute operations, programs, or instructions. However, process flow 200 is not limited to these examples. In other examples, process flow 200 may be performed by an application or service for input recognition processing. In at least one example, the process flow 200 may be performed (e.g., computer-implementable operations) by one or more components of a distributed network (e.g., web services/distributed network services (e.g., cloud services)) for input recognition processing. The process flow 200 describes exemplary processing by the location-based language processing component 102, the entity relationship probability processing component 104, and the robust language model processing component 106 for generating a weighted combined language model 226 (e.g., generated by operations that may be performed by components such as the combined language model processing component 108 of FIG. 1). As shown in process flow 200, any combination of entity relationship probabilistic model processing, location-based language model processing, and robust language model processing results in the generation of a weighted combination language model 220 for evaluating one or more received inputs.
Entity relationship probability model processing refers to the processing described above with respect to entity relationship probability processing component 104 of FIG. 1. The entity relationship probability model process utilizes named entity relationships of entities from the knowledge graph and the query click log data to determine a probability that a named entity associated with the received input is a named entity of the knowledge graph. In doing so, knowledge data 202 is used to evaluate the received input. As an example, spoken utterance input is received and processed (e.g., transcribed) for evaluation using acoustic modeling. This processing (e.g., the sequence of input elements being transcribed) can be evaluated using knowledge data 202. Knowledge data 202 is any information that may be used by entity relationship probabilistic model processing and location-based language model processing to evaluate received input. By way of example, knowledge data 202 includes a knowledge graph and query click log data. The knowledge graph and the query click data are described previously, respectively. Entity relationship probability model processing includes using knowledge data 202 for processing to extract explicitly related named entities (operation 204), processing to extract implicitly related named entities (operation 206), and processing to generate an entity relationship probability model that can be used by a weighted combination language model 220 (operation 208).
Entity relationship probability model processing can collect explicitly related named entities directly from a knowledge graph. An explicitly related named entity is an entity that has a direct connection to the named entity of the knowledge graph. As an example, operation 204 may evaluate all named entities in the knowledge graph that are connected to the source entity (e.g., a starting point for evaluating data of the knowledge graph) within a particular number of paths. A path is a link or connection between named entities of a knowledge graph. In one example, if the named entity Microsoft is identified as the source entity, entities that may be explicitly related may include named entities such as Bill Gates, SteveBallmer, Surface, and Cortana based on evaluating paths directly connected to the named entity Microsoft. The explicitly named entities may be determined (and/or ranked) based on statistical analysis. By way of example, the entity relationship probability model process may rank named entities connected to the source entity using a distance equation similar to:
pe(Ei|EJ)=1/D(Ei,EJ)2+D(Ei,EJ)
where pe (E)i|EJ) Is the probability of a relationship of two named entities, and D (E)i,EJ) Is the distance between them.
Implicitly related named entities are entities that are related in an indirect manner to the source entity of the knowledge graph. The entity relationship probabilistic model process performed in operation 206 requires additional resources in addition to the knowledge graph to extract implicitly related named entities, because the connection paths between implicitly related named entities may be too long or even not connected at all in the knowledge graph. Generally considering that each entity in the knowledge graph has a source URL identifying from which web pages the attributes of the named entity are extracted, the entity relationship probability model process performed in operation 206 uses query click logs to find relevant source URLs to detect implicit entity relationships between named entities. Once a connection between two source URLs is detected, the relationship between the named entities can be identified. In examples, query click data may be evaluated to determine implicit relationships by evaluating query click logs for data including, but not limited to: URLs that are all clicked in the same search session, URLs that are presented in the same set of search results for a single query, and web pages that contain URLs that are links to each other, etc. Implicit named entities can be determined (and/or ranked) based on statistical analysis. As an example, the implicit entity relationship probability between named entities can be evaluated according to the following equation:
wherein when querying entity EjWhen it is counted as CtotalOf all returned search results, Cc(Ei,Ej) Is a search session entity source URL co-click count, Cc(Ej) Is entity EjSource URL click count of C (E)i) Is entity EiSource URL count of Re(Ei,Ej) Calculated by a page rank algorithmThe average relevance of the source URL web pages, and wi is the weight found by linear regression. Notably, the weights obey the overall probability law:
at operation 208, the entity relationship probability model process combines the explicit and implicit relationships together to determine entity relationship probabilities. The entity relationship probability may be determined based on statistical analysis. As an example, the entity relationship probability may be calculated according to the following equation:
pi(EijEj)=w^1pe(EijEj)+w^2pi(EijEj)
assuming Ej and Ek are respectively one of the N named entities detected from the previous utterance within the dialog and one of the M N-grams in the evaluated sentence Si, the probability of Si generated by the entity relationship probability model Mr is:
the entity relationship probabilities calculated in operation 208 may be provided to a weighted combination language model 220 for application to weighted evaluation of two or more of entity relationship probability model processing, location-based language model processing, and robust language model processing.
Location-based language model processing refers to the processing described above with respect to location-based language processing component 102 of FIG. 1. Location-based language processing utilizes named entities with knowledge graphs and location information in query click data from web searches to improve the local entity recognition capabilities of a language model. The location-based language model process uses the knowledge data 202 for processing to evaluate entity location information associated with named entities of the knowledge graph in comparison to entities identified in the received input. As an example, input processing for a spoken utterance may determine a question such as "find the address for Lake Washington" (finding an address of the Washington Lake). Based on the input processing, the named entity "Lake Washington" (Washington Lake) can be identified and evaluated by location-based language model processing. In this example, the input processing may determine that the processing device making the input is located in seattle, washington. This location information can be used to assist in the evaluation of the named entity "Lake Washington" (Washington Lake) included in the knowledge graph. As an example, the processing device location information is very easy to obtain from the GPS data of the mobile device or from the IP address of the PC. In examples, processing device location information may be sent to an input recognition application/service along with input (e.g., utterances). This improves the identification of named entities that may be of interest when the weighted combination language model 220 is applied to the received input.
At operation 210, the location-based language model processing includes aggregating named entities in the knowledge graph according to their locations. Aggregation includes grouping one or more named entities into a plurality of groups according to location. In examples, the location of the named entity may be broader than the detailed address associated with the named entity. In this example, named entities may be aggregated according to location, such as city and state. However, those skilled in the art will recognize that named entities may be aggregated in any form of location information.
The location-based language model processing proceeds to operation 212 where processing for training corpus generation is performed. At operation 212, query click log data (e.g., search query log data) is used to extract query sentences containing entity names to construct a training corpus for each named entity cluster. Generating a training corpus for the clusters of named entities may be determined based on statistical analysis. In generating a training corpus for clusters of named entities, each named entity cluster is used to process the query click log data to generate the training corpus based on equations similar to:
CORPUSi={Sj|Ek∈Sj,Ek∈Ci}
where Sj is the query sentence in the search engine query log, and Ek and Ci are the named entity and entity cluster, respectively.
Location-based language model processing proceeds to operation 214, where a location-based statistical model is built on top of each training corpus. As an example, the location-based statistical model may be an n-gram language model. However, those skilled in the art will recognize that the present disclosure is not limited to this example. The location-based statistical language model generated in operation 214 may be provided to a weighted combination language model 220 for application to a weighted evaluation of two or more of entity relationship probability model processing, location-based language model processing, and robust language model processing.
Robust language model processing refers to the processing described above with respect to the robust language model processing component 106 of FIG. 1. Robust language model processing utilizes training corpus 216 to construct language models suitable for use in weighted combination language model 200 to improve local entity recognition capabilities. The training corpus 216 managed for a robust language model processing component may contain data that includes all of the words of the entities in the knowledge graph of the system 100. Examples of data maintained by training corpus 216 include, but are not limited to: knowledge graph data, query click data, messages, emails, conversations (e.g., a recorded telephone conversation), broadcast news, multiple named entities, vocabularies, dictionaries, and any other information or data provided by resources available internally and externally to the input recognition/understanding system/service. Robust language model processing proceeds to generate (operation 218) a language model constructed from the training corpus 216. By way of example, the language model may be an n-gram model (e.g., a unit model, a 3-gram model, etc.). However, those skilled in the art will recognize that the language model generated in operation 218 is not limited to this example.
The weighted combination language model 220 is used to apply to the processing of the received input. The weighted combination language model 220 generates a combination language model based on the applied entity relationship probabilistic model processing, location-based language model processing, and processing of robust language model processing. As an example, the weighted combination language model 220 is a weighted combination of applied processes for evaluating received input. The operations for applying the weighted combination language model 220 are similar to those described by the combination language model processing component 108 of FIG. 1. As an example, weighting parameters are applied to results generated by entity relationship probabilistic model processing, location-based language model processing, and robust language model processing. In one example, a Maximum A Posteriori (MAP) estimate may be used to estimate or update the weight parameters. However, those skilled in the art will recognize that the setting of the weights for the parameters of the weighted combination language model 220 may be made by operations including, but not limited to: statistical modeling (e.g., linear and non-linear assignments), telemetry data, controlled or uncontrolled testing, and/or prior experience and judgment in connection with combining different types of process models. The decisions and outputs based on the evaluation of the weighted combined language model 220 are described in conjunction with the description of the combined language model processing component 108 of FIG. 1. The input recognition system/service may use statistical analysis to determine a final probability of interpreting the received input. The final probabilistic determination may be used to make determinations regarding output for responding to the received input. The final probability of a sentence generated by the weighted combination language model 220 in one example may be calculated by an equation similar to:
p(Si|M)=w~1p(Si|Mg)+w~2p(Si|Ml)+w~3p(Si|Mr)
wherein M isgIs a universal language model trained on a vast universal training corpus, MlIs a location-based language model selected according to the user's location, MrIs an entity relationship probability model generated from a knowledge graph, and wiAre the weighting parameters that combine these three models together. The output based on the final probability may be the transcription (sentence) with the highest probability of the N best transcriptions. The combined language model examples of the present disclosure increase the probability that a transcript contains the correct named entity (e.g., person, place, thing) expected by the user providing the received input. As an example, the output pairs of the combined language model include a ranking or re-ranking of transcriptions of the named entity data, and one or more transcriptions with the highest probability are selected as the final output. Application of the combined language model may evaluate transcription to minimize named entities that may contain other incorrect common words with similar pronunciations.
FIG. 3 illustrates an example method 300 for generating and applying a combined language model described herein. As an example, method 300 may be performed by an exemplary system, such as system 100 of FIG. 1. In various examples, the method 300 may be performed on a device comprising at least one processor configured to store and execute operations, programs, or instructions. However, the method 300 is not limited to these examples. In other examples, the method 300 may be performed by an application or service for input recognition processing. In at least one example, the method 300 may be performed (e.g., computer-implementable operations) by one or more components of a distributed network (e.g., web services/distributed network services (e.g., cloud services)) for input recognition processing. In one example, the method 300 may be an operation applied by a language model processing component of a speech recognition system.
The method 300 begins at operation 302, where a location-based language model is trained. The training of the location-based language model (operation 302) may include one or more processing operations performed by the location-based language processing component 102 described in FIG. 1 or the process flow 200 including location-based language processing described in the description of FIG. 2. Operation 302 includes generating a location-based language model trained based on the utilized knowledge graph from the named entity data and location information of query click log data associated with named entities of the knowledge graph. In one example, the training of the location-based model (operation 302) includes aggregating named entities of the knowledge graph based on location, and generating the location-based language model based on the aggregated named entities and query click log data associated with the locations of the aggregated named entities. In this example, named entities belonging to the same location are clustered together, and a location-based statistical model is built over the clustered named entities and the query click log data for that location (e.g., city/state) of the entity. The trained location-based models may be used to generate (operation 308) a combined language model.
An entity relationship probability model is trained in operation 304. The training of the entity relationship probability model (operation 304) may include one or more processing operations performed by the entity relationship probability processing component 104 described in fig. 1 or the process flow 200 described in the description of fig. 2 that includes the entity relationship probability model process. As an example, the entity relationship probability model is trained based on evaluating entity relationship data from a knowledge graph and query click log data associated with a named entity. In one example, the explicit relationship of the named entity is determined based on direct connections identified in the knowledge graph, while the implicit relationship of the named entity is determined based on evaluating query click log data associated with the named entity in the knowledge graph. The trained entity relationship probability model may be used to generate (operation 308) a combined language model.
A robust language model is trained in operation 306. The training of the robust language model (operation 306) may include one or more processing operations performed by the robust language model processing component 106 described in fig. 1 or the process flow 200 including the robust language model processing described in the description of fig. 2. The robust language model is trained based on a training corpus that includes a massive vocabulary that includes at least all of the words of the named entities maintained in the knowledge graph. The trained robust language model may be used to generate (operation 308) a combined language model.
Flow proceeds to operation 308 where a combined language model is generated. The combined language model is used to evaluate named entity data associated with one or more received inputs, such as received utterances. In one example, the combined language model is generated from at least two of a trained location-based language model, a trained entity-relationship probability model, and a trained robust language model. In examples, the combined language model applies two or more of the trained models in a weighted combination. When applying the combined language model, the language model generated in each of the operations 302-306 may be assigned a weighting parameter. The weighting and weighting parameters have been discussed previously in the description of fig. 1 and 2.
In operation 310, the generated combined language model is applied to the received input. By way of example, application of the combined language model includes evaluating named entity data associated with the received input using the trained language model generated in operation 302 and 306. Operation 310 includes evaluating one or more named entities associated with the received input using a location-based language model. Operation 310 further comprises determining, using the trained entity relationship probability model, an entity relationship probability that the named entity associated with the received utterance is the named entity of the knowledge graph based on evaluating explicit associations between the named entities of the knowledge graph and implicit associations between the named entities of the knowledge graph. Operation 310 further comprises evaluating the received input using the trained robust language model. Operation 310 further comprises determining a final probability for ranking of one or more transcriptions containing named entity data based on weights assigned to at least the trained location-based language model and the trained entity relationship probability model in the weighted combination language model. As an example, candidate lattices processed based on the acoustic model may be re-ranked based on application of the combined language model.
Based on the application of the combined language model (operation 310), flow proceeds to operation 312 where the output is communicated. In one example, operation 312 includes outputting a result of the received utterance based on the final probability calculated by the weighted combination language model. As an example, the output may be a re-ranking of the lattice generated by the acoustic model based on application of the combined language model. The output may be a signal used by the speech recognition system/service in making a final decision regarding the recognition of the input to the received input. In examples, the output may include one or more transcriptions ranking probabilities of the candidate transcriptions based on the combined speech model.
FIG. 4 illustrates an example method 400 described herein for input processing using a combined language model. As an example, method 400 may be performed by an exemplary system, such as system 100 of FIG. 1. In various examples, the method 400 may be performed on a device comprising at least one processor configured to store and execute operations, programs, or instructions. However, the method 400 is not limited to these examples. In other examples, the method 400 may be performed by an application or service for input recognition processing. In at least one example, the method 400 may be performed (e.g., computer-implementable operations) by one or more components of a distributed network (e.g., web services/distributed network services (e.g., cloud services)) for input recognition processing. In one example, the method 400 may be an operation applied by a language model processing component of a speech recognition system/service.
The method 400 begins at operation 402, where the speech recognition system/service receives a first input. As an example, the input may be an utterance received and processed by a spoken speech understanding system/service or a speech recognition system/service. Those skilled in the art will recognize that an exemplary speech recognition system/service has the capability (e.g., hardware/software) to receive and process input. In one example, the received input is a signal processed by a speech recognition system/service.
Flow proceeds to operation 404 where the location of the processing device that sent the input is detected. The detection of position information of a processing device is known to the person skilled in the art. As an example, the location of the processing device may be obtained from GPS data or networking data (e.g., an IP address) of the processing device. In one example, a speech recognition system/service may be programmed to receive location information of a processing device along with a received utterance.
The flow proceeds to process (operation 406) the received utterance by applying the combined language model to evaluate the first utterance. The combined language model applied is a combined language model or a weighted combined language model as previously described in the description of fig. 1-3 above. In examples, the combined language model application includes two or more different language models including a location-based language model, an entity-relationship probability model, and a robust language model. Operation 406 may evaluate the detected location information to improve evaluation (e.g., re-ranking) of the candidate lattices generated by the acoustic model processing. The location-based model applied by the combined language model may use the detected location information to improve the accuracy of detecting the named entity associated with the received input. In addition to the location-based model processing, operation 406 includes evaluating an entity relationship associated with the named entity identified in the received utterance by applying entity relationship processing (e.g., an entity relationship model). The combined language model applies an entity relationship probability model to evaluate a context associated with a named entity of the received input. This takes advantage of potentially valuable links in context between one or more utterances within the dialog to further improve the language model. The combined language model may use an entity relationship probability model to enhance language model processing and improve ranking/re-ranking of candidate lattices for output. Further, in examples, the combined language model may also apply robust language model processing to enhance language model processing, such as ranking/re-ranking of candidate lattices for entity recognition. Operation 406 further comprises outputting the determination based on the application of the combined language model. As an example, the output may be a re-ranking of the lattice generated by the acoustic model based on application of the combined language model. The output may be a signal used by the speech recognition system/service in making a final decision regarding the recognition of the input to the received input.
Flow may proceed to decision operation 408 where a determination is made whether another utterance was received. If operation 408 determines that another utterance was not received, flow branches no and processing of method 400 ends. Typically, a conversation or session includes multiple exchanges between a processing device and a speech recognition system/service. If operation 408 determines that another utterance was received, flow branches YES and proceeds to operation 410.
At operation 410, entity relationship probabilities from previous processing are recalculated (e.g., utterances in a dialog with a speech recognition system/service). In the case where more than one utterance is received, a speech recognition system/service may be built from the previous utterances for utilizing data from the knowledge graph to enhance the identification of the named entities and output a best ranked transcription. In other words, operation 410 utilizes entity relationships in the knowledge graph to enhance recognition probabilities for named entities related to named entities identified in previous utterances to facilitate subsequent utterances in the overall conversational dialog. As an example, the entity relationship probabilities that the named entity associated with the second utterance is a named entity of the knowledge graph are recalculated using entity extraction data based on applying the combined speech model to the previously received utterance.
Flow proceeds to operation 412 where a final probability is calculated for the second utterance. Operation 412 calculates a final probability for the second utterance based on application of the combined language model described above. In examples, application of the combined speech model includes consideration of the recomputed entity relationship probabilities. For example, the combined language model may apply a weighting to an entity relationship probability model, a location-based language model, and/or a robust language model. The weighting parameters applied to each of the combined language models may be adjusted according to the utterance of a dialog built between the processing device and the speech recognition system/service. For example, higher weights may be assigned to the entity relationship probability model based on the recalculated entity relationship probabilities.
Based on the final probabilities calculated for the second utterance, flow proceeds to operation 414 where the results are output according to processing of the combined speech model. In examples, the output may include one or more transcriptions ranking probabilities of the candidate transcriptions based on the combined speech model. The output may be a signal used by the speech recognition system/service in making a final decision regarding the recognition of the input to the received input.
Process flow loops back to decision operation 408 where a determination is made whether more utterances are received. If operation 408 determines that another utterance was not received, flow branches no and processing of method 400 ends. If operation 408 determines that another utterance was received, flow branches YES and proceeds to operation 410 to evaluate more utterances.
5-7 and the associated descriptions provide a discussion of various operating environments in which examples of the invention may be implemented. However, the devices and systems shown and discussed with respect to FIGS. 5-7 are for purposes of example and illustration, and are not limiting of the vast number of computing device configurations that may be used to implement the examples of the invention described herein.
Fig. 5 is a block diagram illustrating physical components of a computing device 502 (e.g., components of a system that may be used to implement examples of the present disclosure). The computing device components described below may be applicable to the computing devices described above. In a basic configuration, computing device 502 may include at least one processing unit 504 and system memory 506. Depending on the configuration and type of computing device, system memory 506 may include, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of these. System memory 506 may include an operating system 507 and one or more program modules 508 suitable for running software applications 520, such as applications 528, IO manager 524, and other tools 526. As an example, system memory 506 may store instructions for execution. By way of example, other examples of system memory 506 may be a persistent data store/storage. Operating system 507 may be suitable for controlling the operation of computing device 502, for example. Moreover, examples of the invention may be practiced in conjunction with a graphics library, other operating systems, or any other application program and are not limited to any particular application or system. This basic configuration is illustrated in fig. 5 by those components within dashed line 522. Computing device 502 may have additional features or functionality. For example, computing device 502 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in FIG. 5 by removable storage 509 and non-removable storage 510.
As mentioned above, a number of program modules and data files may be stored in system memory 506. While executing on processing unit 504, program modules 508 (e.g., input/output (I/O) manager 524, other tools 526, and applications 528) can perform processes including, but not limited to, the following: such as one or more of the stages of the method 400 of operation shown in fig. 4. Other program modules that may be used in accordance with examples of the invention may include email and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, input recognition applications, drawing or computer-aided application programs, and the like.
Furthermore, examples of the invention may be practiced in an electronic circuit comprising discrete electronic elements, a packaged or integrated electronic chip containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. Examples of the invention may be practiced, for example, by a system on a chip (SOC) in which each or many of the components shown in fig. 5 may be integrated onto a single integrated circuit. Such SOC devices may include one or more processing units, graphics units, communication units, system virtualization units, and various application functions, all integrated (or "burned") onto a chip substrate as a single integrated circuit. When operating with an SOC, the functionality described herein may operate through application specific logic integrated with other components of the computing device 502 on a single integrated circuit (chip). Examples of the invention may also be practiced using other technologies capable of performing logical operations such as, for example, AND, OR, AND NOT, including but NOT limited to mechanical, optical, fluidic, AND quantum technologies. Additionally, examples of the invention may be practiced in a general purpose computer or any other circuits or systems.
Computing device 502 may also have one or more input devices 512, such as a keyboard, mouse, pen, voice input device, device for voice input/recognition, touch input device, etc. Output device(s) 514 such as a display, speakers, printer, etc. may also be included. The above devices are examples, and other devices may be used. Computing device 504 may include one or more communication connections 518 that allow communication with other computing devices 516. Examples of suitable communication connections 516 include, but are not limited to, RF transmitter, receiver, and/or transceiver circuitry; universal Serial Bus (USB), parallel, and/or serial ports.
The term computer readable media as used herein may include computer storage media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, or program modules. System memory 506, removable storage 509, and non-removable storage 510 are all examples of computer storage media (i.e., memory storage). Computer storage media may include RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other article of manufacture that can be used to store information and that can be accessed by computer device 502. Any such computer storage media may be part of computing device 502. Computer storage media does not include a carrier wave or other propagated or modulated data signal.
Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term "modulated data signal" may describe a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, Radio Frequency (RF), infrared and other wireless media.
Fig. 6A and 6B illustrate a mobile computing device 600, such as a mobile phone, a smart phone, a personal data assistant, a tablet personal computer, a laptop computer, and the like, which can be used to implement examples of the present invention. For example, the mobile computing device 600 may be implemented as the system 100, the components of the system 100 may be configured to perform the processing method as described in fig. 4, and so on. Referring to fig. 6A, one example of a mobile computing device 600 for implementing the examples is shown. In a basic configuration, the mobile computing device 600 is a handheld computer having both input elements and output elements. The mobile computing device 600 generally includes a display 605 and one or more input buttons 610 that allow a user to enter information into the mobile computing device 600. The display 600 of the mobile computing device 605 may also serve as an input device (e.g., a touch screen display). Optional side input element 615, if included, allows further user input. The side input element 615 may be a rotary switch, a button, or any other type of manual input element. In alternative examples, the mobile computing device 600 may incorporate more or fewer input elements. For example, in some examples, the display 605 may not be a touch screen. In yet another alternative example, the mobile computing device 600 is a portable telephone system, such as a cellular telephone. The mobile computing device 600 may also include an optional keypad 635. The optional keypad 635 may be a physical keypad or a "soft" keypad generated on the touch screen display. In examples, the output elements include a display 605 to show a Graphical User Interface (GUI), a visual indicator 620 (e.g., a light emitting diode), and/or an audio transducer 625 (e.g., a speaker). In some examples, the mobile computing device 600 incorporates a vibration transducer to provide tactile feedback to the user. In yet another example, the mobile computing device 600 incorporates input and/or output ports such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., an HDMI port) for sending signals to or receiving signals from an external device.
Fig. 6B is a block diagram illustrating an architecture of one example of a mobile computing device. That is, the mobile computing device 600 may incorporate a system (i.e., architecture) 602 to implement certain examples. In one example, the system 602 is implemented as a "smart phone" capable of running one or more applications (such as browsers, e-mail, input handling, calendars, contact managers, messaging clients, games, and media clients/players). In some examples, system 602 is integrated as a computing device, such as an integrated Personal Digital Assistant (PDA) and wireless phone.
One or more application programs 666 may be loaded into memory 662 and run on the operating system 664 or in association with the operating system 1964. Examples of application programs include telephone dialer programs, email programs, Personal Information Management (PIM) programs, word processing programs, spreadsheet programs, internet browser programs, messaging programs, and so forth. The system 602 also includes a non-volatile storage area 662 within the memory 668. The non-volatile storage 668 may be used to store persistent information that is not lost when the system 602 is powered down. Application programs 666 can use and store information in non-volatile storage 668, such as e-mail or other messages used by an e-mail application. A synchronization application (not shown) also resides on the system 602 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in the non-volatile storage 668 synchronized with corresponding information stored at the host computer. It should be understood that other applications may also be loaded into memory 662 and run on the mobile computing device 600 including the IO manager 524, other tools 526, and applications 528 described herein.
The system 602 has a power supply 670 that may be implemented as one or more batteries. The power supply 670 may also include an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries.
The system 602 may include a peripheral device port 678 that performs functions that facilitate connectivity between the system 602 and one or more peripheral devices. Transmissions to and from the peripheral device port 672 are conducted under the control of the operating system 664. In other words, communications received by peripheral port 678 can be disseminated to application programs 666 via the operating system 664, and vice versa.
The system 602 may also include a radio 672 that performs the function of transmitting and receiving radio frequency communications. The radio 672 facilitates wireless connectivity between the system 602 and the "outside world" through a communications carrier or service provider. Transmissions to and from the radio 672 are made under the control of the operating system 664. In other words, communications received by the radio 672 may be disseminated to the application programs 664 via the operating system 666, and vice versa.
Visual indicators 620 may be used to provide visual notifications and/or audio interface 674 may be used to produce audible notifications via audio transducer 625. In the example shown, the visual indicator 620 is a Light Emitting Diode (LED) and the audio transducer 625 is a speaker. These devices may be directly coupled to the power supply 670 so that when activated, they remain on for a duration dictated by the notification mechanism even though the processor 660 and other components might shut down in order to conserve battery power. The LED may be programmed to remain on indefinitely until the user takes action to indicate the powered-on status of the device. The audio interface 674 is used to provide audible signals to and receive audible signals from the user. For example, in addition to being coupled to audio transducer 625, audio interface 674 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation. In accordance with examples of the invention, the microphone may also act as an audio sensor to facilitate control of notifications, as will be described below. The system 602 may further include a video interface 676 that allows operation of the on-board camera 630 to record still images, video streams, and the like.
The mobile computing device 602 implementing the system 600 may have additional features or functionality. For example, the mobile computing device 600 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in FIG. 6B by non-volatile storage 668.
The data/information generated or captured by the mobile computing device 600 and stored via the system 602 may be stored locally on the mobile computing device 600 as described above, or the data may be stored on any number of storage media accessible by the device via the radio 672 or via a wired connection between the mobile computing device 600 and a separate computing device associated with the mobile computing device 600, such as a server computer in a distributed computing network, e.g., the internet. As should be appreciated, such data/information may be accessed via the mobile computing device 600, via the radio 672, or via a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use in accordance with known data/information transfer and storage means, including email and collaborative data/information sharing systems.
FIG. 7 illustrates one example of a system architecture for providing one or more client devices with an application that reliably accesses target data on a storage system and handles communication failures to the one or more client devices, as described above. Target data accessed, interacted with, or edited in association with IO manager 524, other tools 526, applications 528, and storage may be stored in different communication channels or other storage types. For example, various documents may be stored using directory services 722, web portals 724, mailbox services 726, instant messaging stores 728, or social networking sites 730, applications 528, IO managers 524, other tools 526, and storage systems may use any of these types of systems or the like to enable data usage as described herein. The server 720 may provide a storage system for use by clients operating on the general purpose computing device 502 and the mobile device 600 over the network 715. By way of example, the network 715 may include the internet or any other type of local or wide area network, and client nodes may be implemented as computing devices 502 embodied in personal computers, tablet computing devices, and/or mobile computing devices 600 (e.g., smart phones). Either of these examples for client computing device 502 or 600 may obtain content from storage 716.
Reference throughout this specification to "one example" or "an example" means that a particular described feature, structure, or characteristic is included in at least one embodiment. Thus, use of these phrases may refer to more than just one example. Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more examples.
One skilled in the relevant art will recognize, however, that the examples can be practiced without one or more of the specific details, or with other methods, resources, materials, and so forth. In other instances, well-known structures, resources, or operations have not been shown or described in detail to avoid obscuring aspects of the embodiments.
While example examples and applications have been shown and described, it is to be understood that the present embodiments are not limited to the precise configuration and resources described above. Various modifications, changes, and variations apparent to those skilled in the art may be made in the arrangement, operation, and details of the methods and systems disclosed herein without departing from the scope of the present examples as claimed.

Claims (20)

1. A computer-implemented method, comprising:
training a location-based language model based on a knowledge graph utilized from named entity data and location information of query click data associated with named entities of the knowledge graph;
training an entity relationship probability model based on entity relationship data from the knowledge graph and the query click data associated with the named entity; and
a weighted combination language model suitable for evaluating named entity data associated with the received input is generated based on the trained location-based language model and the trained entity-relationship probability model.
2. The computer-implemented method of claim 1, further comprising receiving an input and applying the weighted combination language model to evaluate named entity data associated with the received input.
3. The computer-implemented method of claim 2, wherein applying the weighted combination language model comprises using the entity relationship probability model to determine entity relationship probabilities that named entities associated with the received input are entities of the knowledge graph based on evaluating explicit associations between named entities of the knowledge graph and implicit associations between named entities of the knowledge graph, and wherein the computer-implemented method further comprises calculating a final probability that one or more transcripts contain named entity data based on weights assigned to a trained location-based language model and a trained entity relationship probability model in the weighted combination language model.
4. The computer-implemented method of claim 3, further comprising outputting a result of the received input based on the final probability calculated by the weighted combination language model, wherein the output result includes one or more transcriptions that rank probabilities of candidate transcriptions based on the weighted combination language model.
5. The computer-implemented method in accordance with claim 1, wherein the weighted combination language model is generated based on a trained location-based language model, a trained entity-relationship probability model, and a robust language model.
6. The computer-implemented method of claim 5, further comprising receiving an input and applying the weighted combination language model to evaluate named entity data associated with the received input.
7. The computer-implemented method in accordance with claim 1, wherein training of the location-based model comprises aggregating named entities of the knowledge graph based on location, and generating the location-based language model based on the aggregated named entities and query click data associated with the locations of the aggregated named entities.
8. The computer-implemented method of claim 2, further comprising receiving a second input and recalculating an entity relationship probability that a named entity associated with the second input is a named entity of the knowledge graph by using entity extraction data based on applying the weighted combination language model to the first received input.
9. The computer-implemented method of claim 8, further comprising generating a final probability that one or more transcriptions contain named entity data based on applying the weighted combination language model including the recalculated entity relationship probabilities to the second input, and outputting a result of the second input based on the final probability calculated by the weighted combination language model.
10. A system, comprising:
a memory; and
at least one processor operatively connected with the memory, the at least one processor performing operations comprising:
training a location-based language model based on a knowledge graph utilized from named entity data and location information of query click data associated with named entities of the knowledge graph,
training an entity relationship probability model based on entity relationship data from the knowledge graph and the query click data associated with the named entity, an
A weighted combination language model suitable for evaluating named entity data associated with the received utterance is generated based on the trained location-based language model and the trained entity-relationship probability model.
11. The system of claim 10, wherein the operations performed by the processor further comprise receiving an utterance, and applying the weighted combination language model to evaluate named entity data associated with the received utterance.
12. The system of claim 11, wherein applying the weighted combination language model comprises determining entity relationship probabilities that named entities associated with the received utterance are named entities of the knowledge graph based on evaluating explicit associations between the named entities of the knowledge graph and implicit associations between the named entities of the knowledge graph using the entity relationship probability model, and operations performed by the processor further comprise calculating a final probability that one or more transcriptions contain named entity data based on weights assigned to a trained location-based language model and a trained entity relationship probability model in the weighted combination language model.
13. The system of claim 12, wherein the operations performed by the processor further comprise outputting a result of the received utterance based on a final probability calculated by the weighted combination language model, wherein the output result comprises one or more transcriptions ranking probabilities of candidate transcriptions based on the weighted combination language model.
14. The system of claim 10, wherein the weighted combination language model is generated based on a trained location-based language model, a trained entity-relationship probability model, and a robust language model.
15. The system of claim 14, wherein the operations performed by the processor further comprise receiving an utterance, and applying the weighted combination language model to evaluate named entity data associated with the received utterance.
16. The system of claim 10, wherein training the location-based model comprises aggregating named entities of the knowledge graph based on location, and generating the location-based language model based on the aggregated named entities and query click data associated with the locations of the aggregated named entities.
17. The system of claim 11, wherein the operations performed by the processor further comprise receiving a second utterance and recalculating an entity relationship probability that a named entity associated with the second utterance is a named entity of the knowledge graph by using entity extraction data based on applying the weighted combination language model to the first received utterance.
18. The system of claim 17, wherein the operations performed by the processor further comprise generating a final probability that one or more transcriptions contain named entity data based on applying the weighted combination language model comprising the recalculated entity relationship probabilities to the second utterance, and outputting a result of the second utterance based on the final probability calculated by the weighted combination language model.
19. A computer-readable storage device comprising executable instructions that, when executed on at least one processor, cause the processor to perform a process comprising:
receiving an utterance;
applying a combined language model adapted to evaluate transcription probabilities associated with a received utterance, wherein the combined language model evaluates named entity data associated with the received utterance using a location-based language model and an entity relationship probability model, wherein the combined language model evaluates the named entity data of the received utterance using at least one entity knowledge graph and query click log data associated with named entities of the knowledge graph to generate a final probability that one or more transcripts contain the named entity data; and
outputting a result of the received utterance based on the combined language model calculated final probabilities, wherein the output result includes one or more transcriptions that rank probabilities of candidate transcriptions based on the combined language model.
20. The computer-readable storage device of claim 19, wherein the process further comprises:
receiving a second utterance, an
Recalculating entity relationship probabilities that the named entity associated with the second utterance is the named entity of the knowledge graph by using entity extraction data based on applying the combined language model to previously received utterances,
calculating a final probability of the second utterance based on application of a language model comprising the combination of the recalculated entity-relationship probabilities, an
Outputting a result of the second utterance based on the calculated final probability.
HK17105636.0A 2017-06-07 Language modeling for speech recognition leveraging knowledge graph HK1232014A1 (en)

Publications (2)

Publication Number Publication Date
HK1232014A HK1232014A (en) 2017-12-29
HK1232014A1 true HK1232014A1 (en) 2017-12-29

Family

ID=

Similar Documents

Publication Publication Date Title
CN107210035B (en) Generation of language understanding systems and methods
CN107924679B (en) Computer-implemented method, input understanding system, and computer-readable storage device
US11055355B1 (en) Query paraphrasing
US11386268B2 (en) Discriminating ambiguous expressions to enhance user experience
WO2016196320A1 (en) Language modeling for speech recognition leveraging knowledge graph
CN106575293B (en) Isolated language detection system and method
US10540965B2 (en) Semantic re-ranking of NLU results in conversational dialogue applications
US10007660B2 (en) Contextual language understanding for multi-turn language tasks
US11593613B2 (en) Conversational relevance modeling using convolutional neural network
CN112507715A (en) Method, device, equipment and storage medium for determining incidence relation between entities
US20140074470A1 (en) Phonetic pronunciation
CN114840671A (en) Dialogue generation method, model training method, device, equipment and medium
US9460081B1 (en) Transcription correction using multi-token structures
US11170765B2 (en) Contextual multi-channel speech to text
CN112262382A (en) Annotation and retrieval of contextual deep bookmarks
CN114676227B (en) Sample generation method, model training method and retrieval method
CN114970733A (en) Corpus generation method, apparatus, system, storage medium and electronic device
HK1232014A (en) Language modeling for speech recognition leveraging knowledge graph
HK1232014A1 (en) Language modeling for speech recognition leveraging knowledge graph
US20250315719A1 (en) Performance evaluation of generative question-answering systems