WO2017179335A1 - 情報処理装置、情報処理方法およびプログラム - Google Patents
情報処理装置、情報処理方法およびプログラム Download PDFInfo
- Publication number
- WO2017179335A1 WO2017179335A1 PCT/JP2017/008644 JP2017008644W WO2017179335A1 WO 2017179335 A1 WO2017179335 A1 WO 2017179335A1 JP 2017008644 W JP2017008644 W JP 2017008644W WO 2017179335 A1 WO2017179335 A1 WO 2017179335A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- voice
- processing apparatus
- dictionary
- information processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/227—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Definitions
- This disclosure relates to an information processing apparatus, an information processing method, and a program.
- the voice input technology performs a voice recognition process for recognizing a voice by analyzing voice information generated for a voice uttered by a user.
- character information is generated by analyzing speech information, and the speech is recognized by determining whether the generated character information matches or is similar to the character information included in the dictionary information. For this reason, the performance of speech recognition varies depending on the amount of character information included in the dictionary information. For example, in general, the greater the amount of character information, the higher the possibility that speech will be recognized, but the greater the risk of misrecognition. In general, the smaller the amount of character information, the lower the possibility that speech will be recognized, but the risk of misrecognition also decreases.
- an improvement in recognition performance and a reduction in processing time are a trade-off.
- the selected dictionary information is a small vocabulary dictionary
- the dictionary information may include character information corresponding to the user's voice compared to the case where the selected dictionary information is a small vocabulary dictionary.
- the processing time is slow due to the large number of character information. That is, it takes time until the result of voice recognition is obtained, and the response to the user may be deteriorated. Further, when the character information simply increases, the risk of erroneous recognition increases as described above.
- the present disclosure proposes a mechanism capable of achieving both improvement in recognition performance and reduction in processing time in speech recognition processing.
- an information processing apparatus comprising: a control unit that controls change of correspondence based on object information of an operation using the voice input or subject information of the operation.
- the present disclosure among the set of correspondences between the voice information obtained by voice input using the processor and the voice information and the process based on the voice information used in the voice recognition process. And controlling the change of at least a part of the correspondence relationship based on the object information of the operation using the voice input or the subject information of the operation.
- At least a part of a set of correspondences between the acquisition function for obtaining voice information obtained by voice input and the voice information and the process based on the voice information used in the voice recognition process is provided.
- FIG. 2 is a block diagram schematically illustrating an example of a functional configuration of an information processing system according to a first embodiment of the present disclosure.
- FIG. It is a figure for demonstrating interchange of the correspondence in the information processing apparatus which concerns on the embodiment. It is a figure for demonstrating the change of the use dictionary in the information processing apparatus which concerns on the embodiment.
- 4 is a flowchart conceptually showing an example of overall processing of the information processing system according to the embodiment. It is a flowchart which shows notionally the example of the dictionary change process of the information processing system which concerns on the embodiment. It is a flowchart which shows notionally the example of the whole process of the information processing system which concerns on the modification of the embodiment.
- FIG. 3 is an explanatory diagram illustrating a hardware configuration of an information processing apparatus according to an embodiment of the present disclosure.
- the information processing apparatus 100 is given a number corresponding to the embodiment at the end like the information processing apparatus 100-1 and the information processing apparatus 100-2. To distinguish.
- First Embodiment (Change of voice recognition dictionary based on object information of voice input operation)> First, the first embodiment of the present disclosure will be described.
- the information processing system controls the change of the voice recognition dictionary based on the object information of the voice input operation.
- FIG. 1 is a block diagram schematically illustrating an example of a functional configuration of the information processing system according to the first embodiment of the present disclosure.
- the information processing system includes an information processing apparatus 100-1 and a server 200.
- the information processing apparatus 100-1 includes a voice input unit 102, a voice recognition unit 104, a storage unit 106, a control unit 108, and a communication unit 110.
- the voice input unit 102 acquires voice information as an acquisition unit. Specifically, when a voice is uttered by a user existing in the vicinity of the information processing apparatus 100-1, the voice input unit 102 generates voice signal information related to a signal obtained for the uttered voice. Note that the voice input unit 102 may acquire voice signal information generated by an external voice input device via communication instead of generating voice signal information.
- the voice recognition unit 104 performs voice recognition processing based on the voice information. Specifically, the speech recognition unit 104 performs subsequent processing based on the correspondence between the speech information and processing based on the speech information (hereinafter also referred to as subsequent processing) and the speech information provided from the speech input unit 102. decide. For example, when voice signal information is provided from the voice input unit 102, the voice recognition unit 104 generates character information from the voice signal information. Then, the voice recognition unit 104 matches or resembles the generated character information (hereinafter also referred to as a match) in a set of correspondence relationships between the character information and the subsequent processing (hereinafter also referred to as a dictionary). The presence / absence of character information is determined. If it is determined that there is character information that matches the generated character information, the speech recognition unit 104 notifies the control unit 108 of subsequent processing corresponding to the matched character information.
- subsequent processing the speech information provided from the speech input unit 102. decide. For example, when voice signal information is provided from the voice input unit 102,
- the dictionary is stored in the storage unit 106, and the dictionary used for the speech recognition process (hereinafter also referred to as a use dictionary) is designated by the control unit 108 or fixed. Further, the example has been described in which the dictionary is a set of correspondences between character information and subsequent processing, but the dictionary may be a set of correspondences between audio signal information and subsequent processing.
- the storage unit 106 stores information used in the voice recognition process. Specifically, the storage unit 106 stores a dictionary. For example, the storage unit 106 stores a plurality of dictionaries and provides the dictionary to the voice recognition unit 104. Note that the storage unit 106 may store individual correspondences separately from dictionary units.
- the control unit 108 controls the overall operation of the information processing apparatus 100-1. Specifically, the control unit 108 controls voice recognition processing. More specifically, the control unit 108 controls a dictionary used in the speech recognition process.
- the control unit 108 controls the contents of the usage dictionary. Specifically, the control unit 108 controls the change of at least a part of the usage dictionary based on the object information of the operation using the voice input. For example, the control unit 108 replaces the correspondence relationship determined based on the usage information about the correspondence relationship in the voice recognition process for the voice input operation estimated from the object information of the voice input operation in the usage dictionary. Furthermore, the replacement of the correspondence will be described in detail with reference to FIG.
- FIG. 2 is a diagram for explaining the exchange of correspondence in the information processing apparatus 100-1 according to the present embodiment.
- the control unit 108 determines the correspondence relationship to be replaced based on the usage frequency in the speech recognition process estimated from the object information of the speech input operation. Specifically, the control unit 108 determines a correspondence relationship to be extracted from the correspondence relationships possessed by the use dictionary based on the use frequency. Further, the control unit 108 determines a correspondence relationship to be added to the usage dictionary based on the usage frequency. More specifically, the control unit 108 determines a correspondence relationship that is estimated to be relatively low in the usage dictionary based on the object information of the voice input operation, from the correspondence relationships that the usage dictionary has. .
- control unit 108 determines a correspondence relationship estimated to be higher than the correspondence relationship extracted from the usage frequency based on the object information of the voice input operation from the correspondence relationships stored in the storage unit 106. For example, the control unit 108 determines the correspondence 2 and the correspondence 3 in the usage dictionary illustrated in FIG. 2 as extraction targets. Further, the control unit 108 determines the correspondence 5 and the correspondence 6 shown in FIG. 2 as addition targets.
- the correspondence relationship to be added may be stored in the external device. In this case, the correspondence to be added is acquired via communication.
- control unit 108 replaces the determined correspondence relationship. For example, as illustrated in FIG. 2, the control unit 108 replaces the correspondence relationship 2 and the correspondence relationship 3 determined as the extraction target with the correspondence relationship 5 and the correspondence relationship 6 determined as the addition target in the use dictionary. Note that the number of correspondences to be extracted may be different from the number of correspondences to be added.
- the object information of the voice input operation is information estimated based on information acquired about the object of the voice input operation.
- the object information for the voice input operation includes information for specifying an operation target.
- the operation target includes a running application.
- the control unit 108 acquires information that identifies an active application, and uses a word (hereinafter, referred to as an application operation) that is relatively frequently used in the operation of the application specified from the acquired information.
- Corresponding relationship related to the operation word is added to the use dictionary. Specifically, for an application that distributes news, a correspondence relationship related to a word such as “bookmark” or “tell me in detail” is added to the usage dictionary.
- a correspondence relationship related to a word such as “pause” or “next song” is added to the usage dictionary.
- a correspondence relationship related to a word such as “receive” or “reject” is added to the usage dictionary.
- a correspondence relationship related to a word such as “recording start” or “recording stop” is added to the use dictionary.
- the control unit 108 acquires information that identifies the connected external device 10, and the external device 10 that is relatively frequently used to operate the external device 10 specified from the acquired information.
- the correspondence relation related to the operation word is added to the use dictionary.
- the correspondences related to words such as “change the program” for television installed in the house, “23 degrees” for air conditioner, “stop music” for audio equipment, etc. are used dictionary. Added.
- Correspondences related to words such as “tell me directions” for car navigation installed in the vehicle and “open windows” for devices that control the windows of the vehicle are added to the usage dictionary.
- the operation target attribute includes the type of application or external device 10, installation location, owner, or any other group.
- the control unit 108 adds a correspondence relationship related to words such as “reproduction start” or “reproduction end” to the use dictionary. Note that a union of correspondence relations related to words for a plurality of operation objects having the same attribute may be added to the use dictionary.
- the correspondence added to the use dictionary may be a part of the correspondence determined based on the object information of the voice input operation.
- an index for selecting the added correspondence or the added correspondence may be set by the user.
- the control unit 108 selects the correspondence relationship to be set as the correspondence relationship to be added, or narrows down the correspondence relationship to be added based on the set index.
- a list of correspondence relationships determined based on the object information of the voice input operation may be presented to the user.
- a correspondence relationship related to an operation word has been described.
- a correspondence relationship related to a word about application or device activation (hereinafter also referred to as an activation word) may be added.
- an activation word a correspondence relationship related to a word about application or device activation
- a use dictionary may be provided separately for the activation word and the operation word.
- the use dictionary may be changed based on a plurality of object information. For example, when a plurality of applications are activated, the control unit 108 may change the use dictionary for the plurality of applications. In addition, when the application is activated and the device is connected to the information processing apparatus 100-1, the control unit 108 may change the use dictionary for the application and the device. In addition, the control unit 108 may change the use dictionary only for some object information among the plurality of object information. For example, the control unit 108 may change the use dictionary only for object information having a higher priority than other object information.
- control unit 108 may determine the correspondence relationship to be replaced based on the availability in the voice recognition process estimated from the object information of the voice input operation. Specifically, the control unit 108 determines the correspondence relationship to be extracted from the correspondence relationships possessed by the usage dictionary based on the availability. Further, the control unit 108 determines a correspondence relationship to be added to the usage dictionary based on the availability. More specifically, the control unit 108 determines whether the usage dictionary includes a correspondence relationship that is not permitted in the voice recognition process for the voice input operation based on the object information of the voice input operation. When it is determined that a correspondence relationship that is not permitted to be used is included in the use dictionary, the control unit 108 is permitted to use the correspondence relationship that is not permitted to be used in the voice recognition processing for the voice input operation. Replace with correspondence.
- the control unit 108 controls the usage dictionary in units of dictionaries. Specifically, the control unit 108 controls the change of the use dictionary based on the object information of the operation using the voice input. For example, the control unit 108 changes the use dictionary to a dictionary corresponding to the object information of the voice input operation. Furthermore, with reference to FIG. 3, the change of a use dictionary is demonstrated in detail.
- FIG. 3 is a diagram for explaining the change of the use dictionary in the information processing apparatus 100-1 according to the present embodiment.
- the control unit 108 selects a dictionary corresponding to the object information of the voice input operation. For example, the control unit 108 selects a dictionary corresponding to the object information of the voice input operation from a plurality of dictionaries stored in the storage unit 106. Note that the size of the selected dictionary, that is, the amount of correspondence that the dictionary has may be different from that of the dictionary used. As shown in FIG. 3, the size of the dictionary used may be n, whereas the size of the selected dictionary may be m.
- control unit 108 determines the selected dictionary as a use dictionary. For example, the control unit 108 designates the selected dictionary as a use dictionary. Note that the contents of the used dictionary may be rewritten to the contents of the selected dictionary.
- control unit 108 may control the execution subject of the voice recognition process. Specifically, the control unit 108 causes at least one of the information processing apparatus 100-1 and the server 200 to perform a voice recognition process. For example, when voice information is provided from the voice input unit 102 to the voice recognition unit 104, the control unit 108 causes both the voice recognition unit 104 and the server 200 to perform voice recognition processing.
- control unit 108 may determine the execution subject of the voice recognition process based on whether the information processing apparatus 100-1 can communicate. For example, when communication with the server 200 is difficult, the control unit 108 causes the voice recognition unit 104 to execute the voice recognition process. In addition, when communication with the server 200 is possible, the control unit 108 causes both the voice recognition unit 104 and the server 200 or only the server 200 to execute the voice recognition process.
- control unit 108 when the control unit 108 causes both the voice recognition unit 104 and the server 200 to execute voice recognition processing, the control unit 108 performs arbitration processing on each processing result. Specifically, the control unit 108 employs one of the speech recognition results based on the evaluation of each of the speech recognition results of the speech recognition unit 104 and the server 200. For example, when the recognition accuracy of the voice recognition unit 104 is less than the threshold, the control unit 108 waits until the voice recognition result of the server 200 is received. In addition, when the recognition accuracy of the voice recognition unit 104 is equal to or higher than the threshold, the control unit 108 performs subsequent processing using the voice recognition result of the voice recognition unit 104 without waiting for the reception of the voice recognition result of the server 200. .
- the communication unit 110 communicates with the server 200 and the external device 10. Specifically, the communication unit 110 transmits a dictionary provision request, a voice recognition request, and voice information to the server 200, and receives a dictionary and a voice recognition result from the server 200. Further, the communication unit 110 transmits an operation request and a dictionary provision request to the external device 10 and receives a dictionary from the external device 10. For example, the communication unit 110 transmits a dictionary provision request to each of the external devices 10 that can be operated by the broadcast method, and receives a dictionary from each of the external devices 10 that permit the operation. When a dictionary for the external device 10 is stored in the storage unit 106 of the information processing apparatus 100-1, a dictionary provision request is not transmitted to the external device 10. When a dictionary for the external device 10 is stored in the server 200, a dictionary provision request is transmitted to the server 200 or the server 200 is caused to execute a voice recognition process.
- the server 200 includes a communication unit 202, a control unit 204, a voice recognition unit 206, and a storage unit 208.
- the communication unit 202 communicates with the information processing apparatus 100-1. Specifically, the communication unit 202 receives a dictionary provision request, a voice recognition request, and voice information from the information processing apparatus 100-1, and transmits a dictionary and a voice recognition result to the information processing apparatus 100-1.
- the control unit 204 controls the operation of the server 200 as a whole. Specifically, the control unit 204 controls voice recognition processing in response to a voice recognition request. For example, when a voice recognition request is received from the information processing apparatus 100-1, the control unit 204 causes the voice recognition unit 206 to execute voice recognition processing based on voice information received together with or separately from the voice recognition request. Then, the control unit 204 causes the communication unit 202 to transmit the speech recognition result of the speech recognition unit 206 to the information processing apparatus 100-1.
- control unit 204 performs a dictionary providing process in response to a dictionary providing request. Specifically, when a dictionary provision request is received from information processing apparatus 100-1, control unit 204 acquires a dictionary (or correspondence) from storage unit 208 from the dictionary provision request. Then, the control unit 204 causes the communication unit 202 to transmit the acquired dictionary (or correspondence) to the information processing apparatus 100-1.
- the voice recognition unit 206 performs voice recognition processing based on the voice information. Note that the voice recognition processing of the voice recognition unit 206 is substantially the same as the processing of the voice recognition unit 104 of the information processing apparatus 100-1, and thus description thereof is omitted.
- the storage unit 208 stores information used for voice recognition processing. Specifically, the storage unit 208 stores a dictionary and a correspondence relationship. For example, the dictionary stored in the storage unit 208 may be larger than the information processing apparatus 100-1, and the number of stored dictionaries may be larger.
- FIG. 4 is a flowchart conceptually showing an example of the entire processing of the information processing system according to this embodiment.
- the information processing apparatus 100-1 acquires the object information of the voice input operation (step S302). Specifically, the control unit 108 acquires information related to the active application or the external device 10 connected to the information processing apparatus 100-1.
- the information processing apparatus 100-1 determines whether a change has occurred in the object information (step S304). Specifically, the control unit 108 determines whether the application is newly activated, the activated application is terminated, the external device 10 is newly connected, or the connection with the connected external device 10 is disconnected. It is determined whether or not there has been a change such as having been made.
- the information processing apparatus 100-1 changes the usage dictionary based on the object information (step S306). Specifically, the control unit 108 changes the use dictionary for the application relating to the change or the external device 10. Details will be described later.
- the information processing apparatus 100-1 determines whether voice is input (step S308). Specifically, the voice recognition unit 104 determines whether voice information is provided by the voice input unit 102.
- the information processing apparatus 100-1 executes voice recognition processing based on the dictionary (step S310). Specifically, when the voice information is provided, the voice recognition unit 104 performs a voice recognition process on the provided voice information based on a use dictionary specified by the control unit 108.
- the information processing apparatus 100-1 performs subsequent processing according to the voice recognition result (step S312). Specifically, the control unit 108 executes subsequent processing specified by the voice recognition processing of the voice recognition unit 104.
- FIG. 5 is a flowchart conceptually showing an example of dictionary change processing of the information processing system according to the present embodiment.
- the information processing apparatus 100-1 determines whether the object related to the change is an application (step S322). Specifically, the control unit 108 determines whether there is an application that has been newly started or determined to have ended.
- the information processing apparatus 100-1 acquires a correspondence relationship corresponding to the application (step S324). Specifically, the control unit 108 acquires a correspondence relationship corresponding to the newly started application from the storage unit 106 or the server 200. When the application is terminated, a correspondence relationship corresponding to the active application and not in the use dictionary is acquired.
- the information processing apparatus 100-1 determines whether the object related to the change is a device (step S326). Specifically, the control unit 108 determines whether there is an external device 10 that is newly connected or determined to have been disconnected.
- a correspondence relationship corresponding to the device is acquired (step S328). Specifically, the control unit 108 acquires a correspondence relationship corresponding to the newly connected external device 10 from the storage unit 106, the external device 10, or the server 200. When the connection of the external device 10 is disconnected, a correspondence relationship corresponding to the connected external device 10 and not in the use dictionary is acquired.
- the information processing apparatus 100-1 changes the usage dictionary (step S330). Specifically, the control unit 108 selects a correspondence relationship to be extracted from the use dictionary, and replaces the selected correspondence relationship with the acquired correspondence relationship.
- the information processing apparatus 100-1 determines whether communication is available (step S332). Specifically, the control unit 108 determines whether communication with the server 200 is possible.
- the information processing apparatus 100-1 adds an external dictionary as a use dictionary via communication (step S334). Specifically, when it is determined that communication with the server 200 is possible, the control unit 108 performs voice recognition processing on both the voice recognition unit 104 of the information processing apparatus 100-1 and the voice recognition unit 206 of the server 200. As the subject of Thereby, a use dictionary can be changed substantially.
- the information processing apparatus 100-1 uses the correspondence relationship between the voice information obtained by voice input and the process based on the voice information, which is used in the voice recognition process.
- the change of the correspondence of at least a part of the set is controlled based on the object information of the operation using voice input.
- the contents of the usage dictionary can be appropriately replaced. Therefore, it is possible to prevent malfunction due to voice recognition in daily conversation without providing an activation word.
- the recognition rate can be improved without increasing the size of the dictionary used. Thereby, an increase in misrecognition and a prolonged processing time can be suppressed. Accordingly, it is possible to achieve both improvement in recognition performance and reduction in processing time in the voice recognition processing.
- the recognition rate can be improved without executing a plurality of voice recognition processes. Thereby, an increase in manufacturing cost and processing load can be suppressed.
- the correspondence relationship related to the change includes a correspondence relationship that is determined based on usage information about the correspondence relationship in the speech recognition process for the operation, which is estimated from the object information of the operation. For this reason, the correspondence which a use dictionary has can be optimized in advance about voice input operation. Accordingly, it is possible to improve both the recognition performance and the processing time while maintaining the size of the dictionary used.
- the use information includes information for specifying the use frequency. For this reason, the correspondence dictionary that is relatively likely to be used in the speech recognition process and the correspondence relation that is relatively unlikely to be used and the correspondence dictionary are replaced in the use dictionary, thereby maintaining the size of the dictionary used.
- the recognition rate can be improved. Therefore, it is possible to achieve both improvement in the recognition rate, suppression of erroneous recognition, and reduction in processing time.
- the use information includes information for specifying whether or not the use can be performed. For this reason, it is possible to remove a correspondence relationship that is not permitted for use in the speech recognition processing from the use dictionary. For example, it is possible to remove in advance from the use dictionary a correspondence relationship that may induce erroneous recognition estimated from an application or the external device 10. On the other hand, it is possible to add in advance a correspondence relationship to be positively recognized to the use dictionary. Accordingly, the recognition performance can be improved more effectively.
- the information processing apparatus 100-1 further controls the change of the correspondence set based on the object information of the operation. For this reason, the correspondence used for the speech recognition processing can be changed in units of dictionaries. Accordingly, it is possible to quickly change the contents of the usage dictionary, that is, the correspondence relationship.
- the use dictionary may be changed by switching the voice recognition processing in which the use dictionary is different.
- the change of the correspondence set includes the change to the correspondence set having a different set size. For this reason, by changing the contents of the usage dictionary and changing the size of the usage dictionary, it is possible to prepare a usage dictionary that is more suitable for speech recognition processing for speech input estimated from object information.
- the above correspondence relationship is changed via communication. Therefore, it is possible to add a correspondence relationship that the information processing apparatus 100-1 does not have to the use dictionary. Therefore, the recognition performance can be improved as compared with the case where the information processing apparatus 100-1 operates alone.
- the object information of the operation includes information for specifying the operation target or the attribute of the operation target. For this reason, the contents of the use dictionary can be optimized based on the target of the voice input operation. Therefore, the input voice can be easily recognized correctly, and the recognition performance can be effectively improved.
- the operation target includes an application or a device. Therefore, it is possible to add a correspondence relationship suitable for the voice input operation of the active application or the external device 10 connected to the information processing apparatus 100-1 to the use dictionary. Therefore, the voice is easily recognized as intended by the user, and the operation by the voice input of the application or the external device 10 can be facilitated.
- the information processing apparatus 100-1 further controls the change of the correspondence relationship based on whether or not the information processing apparatus 100-1 can communicate. Therefore, correspondences that are not stored in the information processing apparatus 100-1 can be collected. Accordingly, variations in the usage dictionary can be increased, and the recognition performance can be further improved. Further, when communication with an external device such as the server 200 capable of executing the voice recognition process is possible, the server 200 can be caused to execute the voice recognition process. In this case, the processing load can be reduced by not executing the voice recognition process in the information processing apparatus 100-1. Further, by executing the voice recognition process in the information processing apparatus 100-1, it is possible to use a voice recognition result having a higher evaluation among a plurality of voice recognition results.
- the object information of the operation includes information estimated based on information acquired about the object of the operation. For this reason, it is possible to change the use dictionary to an appropriate dictionary before the user performs a voice input operation. Therefore, the user can perform a smooth voice input operation from the beginning.
- the voice information related to the correspondence includes voice information (start word) indicating the start of the operation or voice information (operation word) indicating the content of the operation.
- the recognition performance of the activation word or the operation word generally affects the feeling of operation. For example, compared to the case where the voice is accurately recognized by a single utterance, the user may feel the operation complicated when the voice is finally accurately recognized by a plurality of voices.
- the recognition performance of the activation word or the operation word is improved, so that it is possible to reduce a possibility that the user feels bothersome.
- the processing time is prevented from being prolonged, the responsiveness to utterance is improved and the operational feeling can be further improved.
- the correspondence relationship prepared for the activation word is generally smaller than that of the operation word, it is important to determine which correspondence relationship is included in the use dictionary. Therefore, the information processing apparatus 100-1 according to the present embodiment can be used. More meaningful.
- the information processing apparatus 100-1 may change the use dictionary using the voice recognition result.
- the object information of the operation may be information obtained by voice recognition processing, and the control unit 108 controls the change of the use dictionary based on the recognized information.
- the object information of the recognized operation includes information for specifying the operation content, and the control unit 108 controls the change of the use dictionary according to the operation content. For example, when a voice “Navigate to” regarding a route presentation request to a certain destination with respect to the navigation application is recognized, the control unit 108 adds a correspondence that can recognize the destination to the use dictionary, or Switch the used dictionary to a dictionary that can recognize.
- the object information of the recognized operation includes information for specifying the operation target.
- the operation target includes the above-described application or the external device 10.
- the control unit 108 adds a correspondence that is relatively frequently used for the operation of the application of the name or type to the usage dictionary or the correspondence Switch the used dictionary to the dictionary that contains the relationship.
- control unit 108 adds, to the use dictionary, a correspondence that is relatively frequently used for the operation of the external device 10 with the name or type. Or the dictionary to be used is switched to a dictionary including the corresponding relationship.
- the operation target may be a voice recognition agent.
- the control unit 108 switches to the voice recognition agent.
- control unit 108 may control the notification to the subject of the voice input operation regarding the change of the use dictionary as the notification control unit. Specifically, the control unit 108 causes the information processing apparatus 100-1 or an external apparatus connected to the information processing apparatus 100-1 to notify the user of information indicating that the use dictionary has been changed.
- the notification may be any of visual notification, auditory notification, and tactile notification, or a combination thereof.
- the control unit 108 causes the speaker to output a sound corresponding to the change of the use dictionary.
- a voice “Navigate to” is recognized and the use dictionary is changed, an operation sound for prompting voice input for a subsequent destination is output from the speaker.
- the recognized voice may be output after the voice is recognized and before the operation sound is output. That is, a voice “Navigate to” may be output. In this case, the user can grasp whether the voice is recognized as intended by the user.
- control unit 108 displays a display object corresponding to the change of the use dictionary on the display.
- the control unit 108 displays a display object corresponding to the change of the use dictionary on the display.
- the usage dictionary is changed for an application
- the character information displayed for the application is changed.
- the control unit 108 causes the external device 10 to perform an operation according to the change of the usage dictionary. For example, when the usage dictionary is changed for the external device 10, the light emitting unit of the external device 10 is caused to emit light, or the external device 10 is vibrated. Further, the control unit 108 causes the speaker to output a sound specific to the external device 10. Note that the speaker may be provided in the information processing apparatus 100-1, or may be provided in an external device connected to the information processing apparatus 100-1, such as the external device 10.
- control unit 108 when the voice recognition agent is changed, the control unit 108 causes a response according to the voice recognition agent to be changed. For example, when the voice recognition agent is switched, the wording corresponding to the voice recognition agent is output. Moreover, the control part 108 may switch the audio
- FIG. 6 is a flowchart conceptually showing an example of the entire processing of the information processing system according to the modification of the present embodiment.
- the information processing apparatus 100-1 determines whether voice is input (step S402). If it is determined that the voice is input, the information processing apparatus 100-1 executes a voice recognition process based on the use dictionary (step S404).
- the information processing apparatus 100-1 determines whether the object information has been recognized (step S406). If it is determined that the object information has been recognized, the information processing apparatus 100-1 changes the use dictionary based on the object information (step S408). ). Specifically, the control unit 108 determines whether the character information generated by the voice recognition unit 104 includes character information indicating object information. If it is determined that the character information indicating the object information is included in the generated character information, the control unit 108 changes the use dictionary based on the object information. Details will be described later.
- the information processing apparatus 100-1 notifies the change of the usage dictionary (step S410). Specifically, the control unit 108 notifies the user visually, auditorily, or tactilely that the use dictionary has been changed.
- the information processing apparatus 100-1 determines whether or not a voice is input (step S412). If it is determined that a voice is input, the information processing apparatus 100-1 executes a voice recognition process based on the changed usage dictionary (step S412). S414). Then, the information processing apparatus 100-1 performs subsequent processing according to the recognition result (step S416).
- FIG. 7 is a flowchart conceptually showing an example of dictionary change processing of the information processing system according to the modification of the present embodiment.
- the information processing apparatus 100-1 determines whether the application is recognized (step S422). Specifically, the control unit 108 determines whether the character information generated by the voice recognition unit 104 includes character information indicating the name or type of the application.
- the information processing apparatus 100-1 acquires the usage information of the correspondence relationship for the application (step S424). Specifically, the control unit 108 acquires the usage frequency and availability of the correspondence relationship for the recognized application from the storage unit 106 or the like.
- the information processing apparatus 100-1 determines whether the external device 10 has been recognized (step S426). Specifically, the control unit 108 determines whether the character information generated by the voice recognition unit 104 includes character information indicating the name or type of the external device 10.
- the information processing apparatus 100-1 acquires the usage information of the correspondence relationship for the external device 10 (step S428). Specifically, the control unit 108 acquires, from the storage unit 106 or the like, information indicating the use frequency of the correspondence relationship for the recognized external device 10 and the availability of use.
- the information processing apparatus 100-1 determines whether or not a correspondence relationship having a relatively low use frequency exists in the use dictionary (step S430). Specifically, the control unit 108 determines whether the usage dictionary has a correspondence relationship that is relatively lower in use frequency than a correspondence relationship that does not exist in the usage dictionary among the correspondence relationships for the recognized application or the external device 10. Determine.
- the information processing apparatus 100-1 determines whether or not a correspondence relationship that is not permitted to be used exists in the use dictionary (step S432). Specifically, the control unit 108 determines whether or not a correspondence relationship that is not permitted to be used exists in the usage dictionary among the correspondence relationships of the recognized application or the external device 10.
- the information processing apparatus 100-1 changes the use dictionary (step S434). Specifically, the control unit 108 replaces a correspondence relationship that is relatively low in use frequency or a correspondence relationship that is not permitted to be used with a correspondence relationship that is relatively high in use frequency or a correspondence relationship that is permitted to be used.
- the information processing apparatus 100-1 determines whether the operation content has been recognized (step S436). Specifically, the control unit 108 determines whether the character information generated by the voice recognition unit 104 includes character information indicating the operation content.
- the information processing apparatus 100-1 changes the use dictionary to a dictionary corresponding to the operation content (step S438). Specifically, the control unit 108 determines a voice recognition unit whose dictionary corresponding to the recognized operation content is a use dictionary as an execution subject of the voice recognition process.
- the information processing apparatus 100-1 determines whether or not the voice recognition agent has been recognized (step S440). Specifically, the control unit 108 determines whether the character information generated by the voice recognition unit 104 includes character information indicating a voice recognition agent.
- the information processing apparatus 100-1 changes the voice recognition agent (step S442). Specifically, the control unit 108 changes the voice recognition agent used for the recognized voice recognition agent.
- the operation object information includes information obtained by the voice recognition process. For this reason, a use dictionary can be changed based on the voice inputted by the user. Therefore, it is possible to change the used dictionary to a dictionary suitable for the operation intended by the user more reliably.
- the information processing apparatus 100-1 controls notification to the subject of the voice input operation regarding the change in the correspondence. For this reason, by notifying the user of the change of the dictionary in use, the user can know that the voice input is ready. Accordingly, it is possible to avoid a voice recognition failure due to a voice input by the user before the usage dictionary is changed. Thereby, it can suppress that a user feels dissatisfaction or stress.
- Second Embodiment (Change of Speech Recognition Dictionary Based on Subject Information of Voice Input Operation)>
- the first embodiment and modifications of the present disclosure have been described.
- a second embodiment of the present disclosure will be described.
- the information processing system controls the change of the voice recognition dictionary based on the subject information of the voice input operation.
- FIG. 8 is a block diagram schematically illustrating an example of a functional configuration of an information processing system according to the second embodiment of the present disclosure. Note that description of substantially the same function as that of the first embodiment is omitted.
- the information processing apparatus 100-2 includes a subject recognition unit 120 and an observation unit 122 in addition to the voice input unit 102, the voice recognition unit 104, the storage unit 106, the control unit 108, and the communication unit 110.
- the control unit 108 controls the change of at least a part of the usage dictionary based on the subject information of the operation using the voice input. Specifically, the control unit 108 replaces the correspondence determined based on the usage information about the correspondence in the speech recognition process for the voice input operation, which is estimated from the subject information of the voice input operation, in the usage dictionary. . For example, the control unit 108 determines the correspondence relationship to be replaced based on the use frequency or availability in the speech recognition process estimated from the subject information of the voice input operation. Then, the control unit 108 replaces the determined correspondence relationship.
- the subject information of the voice input operation is information estimated based on information acquired about the subject of the voice input operation.
- the subject information of the voice input operation there is information that identifies the mode of the subject of the operation.
- the control unit 108 acquires information that identifies the user's behavior generated by the subject recognition unit 120, and uses it in an operation that is estimated to be performed during the user's behavior that is identified from the acquired information.
- Correspondences related to operation words that are relatively frequently used are added to the use dictionary. Specifically, when the recognized action is running, a correspondence relationship related to a word such as “pause workout” or “resume workout” is added to the use dictionary.
- the control unit 108 acquires information that identifies the user posture generated by the subject recognition unit 120, and is used in an operation that is estimated to be performed with the user posture identified from the acquired information.
- Correspondences related to operation words having a relatively high frequency are added to the use dictionary. Specifically, when the recognized posture is supine, a correspondence relationship related to a word such as “stop awakening” or “turn off lighting” is added to the use dictionary.
- the control unit 108 is used in an operation of acquiring information that is generated by the subject recognition unit 120 and specifying the position of the user, and that is estimated to be performed at the position of the user specified from the acquired information.
- Correspondences related to operation words having a relatively high frequency are added to the use dictionary. More specifically, when the recognized position is in the train, the correspondence relationship related to the words such as “How many stations will be transferred?” Or “Set to manner mode” is added to the use dictionary.
- the position of the subject of the operation may be information indicating a landmark such as a building name, a facility name, or a place name, or information indicating a topography, in addition to geographical information.
- control unit 108 may add a correspondence relationship related to words in a language with a relatively high frequency used at the user's position to the use dictionary. Specifically, if the recognized position is within the United States, the correspondence relationship for the English word is added to the usage dictionary. If the recognized position is Osaka, the correspondence relationship related to the Kansai dialect word is added to the usage dictionary.
- the subject information of the voice input operation may be information specifying the surrounding environment of the subject of the voice input operation. Specifically, there is noise as the surrounding environment of the subject of the operation.
- the control unit 108 acquires information that estimates the noise around the user generated by the subject recognition unit 120, and uses the information in the voice recognition process according to the degree of noise around the user estimated from the acquired information.
- Correspondences related to operation words that are relatively frequently used are added to the use dictionary. Specifically, when the recognized noise level is equal to or greater than the threshold, the correspondence relationship relating to the word such as the onomatopoeia is extracted from the use dictionary as the correspondence relationship that is not permitted.
- the correspondence added to the use dictionary may be a part of the correspondence determined based on the subject information of the voice input operation.
- the usage dictionary may be changed based on a plurality of main body information. For example, when the user is reading an electronic book on a train, the control unit 108 may change the usage dictionary for the position and behavior of the user. Further, the control unit 108 may change the use dictionary only for some of the subject information. Further, the control unit 108 may control the change of the usage dictionary based on the subject information of the operation using the voice input.
- the subject recognition unit 120 performs recognition processing for the subject of the voice input operation. Specifically, the subject recognition unit 120 recognizes the user's action, posture, or position based on information obtained from the observation unit 122. For example, the subject recognition unit 120 recognizes the user's action, posture, or position based on inertia information such as acceleration or angular velocity obtained from the observation unit 122, GPS (Global Positioning System) information, or image information. In addition to information obtained from the observation unit 122, information obtained from an external device via the communication unit 110 may be used. For example, user schedule information possessed by an external device may be used.
- the observation unit 122 observes the subject of the voice input operation. Specifically, the observation unit 122 observes the user's movement, posture, or position. For example, the observation unit 122 generates inertia information, position information, or image information about the user using an inertial sensor such as an acceleration sensor or an angular velocity sensor, a GPS sensor, or an imaging sensor.
- an inertial sensor such as an acceleration sensor or an angular velocity sensor, a GPS sensor, or an imaging sensor.
- FIG. 9 is a flowchart conceptually showing an example of the overall processing of the information processing system according to the present embodiment.
- the information processing apparatus 100-2 acquires the subject information of the voice input operation (step S502). Specifically, the subject recognition unit 120 performs a recognition process on the user's behavior, posture, position, or surrounding environment based on inertia information, position information, or image information obtained from the observation unit 122. Then, the control unit 108 acquires information related to the user's action, posture, position, or surrounding environment recognized by the subject recognition unit 120.
- the information processing apparatus 100-2 determines whether the subject information has changed (step S504). Specifically, the control unit 108 determines whether the user's behavior, posture, position, or surrounding environment has changed based on information obtained from the subject recognition unit 120.
- the information processing apparatus 100-2 changes the usage dictionary based on the subject information (step S506). Specifically, the control unit 108 changes the use dictionary for the behavior, posture, position, or surrounding environment related to the change. Details will be described later.
- the information processing apparatus 100-2 determines whether or not voice is input (step S508), and when it is determined that voice is input, performs voice recognition processing based on the usage dictionary (step S510). Then, the information processing apparatus 100-2 executes subsequent processing according to the voice recognition result (step S512).
- FIG. 10 is a flowchart conceptually showing an example of dictionary change processing of the information processing system according to the present embodiment.
- the information processing apparatus 100-2 determines whether the changed mode is the user's behavior (step S522). When the information processing device 100-2 determines that the user's behavior has changed, the use of the correspondence relationship for the changed user's behavior is used. Information is acquired (step S524). Specifically, when the control unit 108 determines that the user behavior recognized by the subject recognition unit 120 has changed from the previously recognized behavior, the control unit 108 determines whether the user behavior after the change has occurred. The usage frequency and availability of the correspondence relationship are acquired from the storage unit 106 or the like.
- the information processing apparatus 100-2 determines whether or not the changed mode is the user's posture (step S526), and if it is determined that the user's posture has changed, the correspondence relationship regarding the changed user's posture Usage information is acquired (step S528). Specifically, when the control unit 108 determines that the posture of the user recognized by the subject recognition unit 120 has changed from the previously recognized posture, the control unit 108 determines whether the posture of the user after the change has been changed. The usage frequency and availability of the correspondence relationship are acquired from the storage unit 106 or the like.
- the information processing apparatus 100-2 determines whether the changed mode is the user's position (step S530). If the information processing apparatus 100-2 determines that the user's position has changed, the information processing apparatus 100-2 determines the correspondence relationship regarding the changed user's position. Usage information is acquired (step S532). Specifically, when the control unit 108 determines that the position of the user recognized by the subject recognition unit 120 has changed from the previously recognized position, the control unit 108 determines the position of the user after the change. The usage frequency and availability of the correspondence relationship are acquired from the storage unit 106 or the like.
- the information processing apparatus 100-2 determines whether the changed mode is the user's surrounding environment (step S534), and if it is determined that the user's surrounding environment has changed, the information processing apparatus 100-2 determines the changed user's surrounding environment.
- the usage information of the corresponding relationship is acquired (step S536). Specifically, when the control unit 108 determines that the surrounding environment of the user recognized by the subject recognition unit 120 has changed from the previously recognized surrounding environment, the control unit 108 changes the surroundings of the user after the change.
- the usage frequency and availability of the correspondence relationship for the environment are acquired from the storage unit 106 or the like.
- the information processing apparatus 100-2 determines whether or not a correspondence relationship having a relatively low use frequency exists in the use dictionary (step S538), and further determines whether or not a correspondence relationship that is not permitted to be used exists in the use dictionary. Determination is made (step S440). If it is determined that a correspondence relationship with a relatively low use frequency or a correspondence relationship that is not permitted to be used exists in the use dictionary, the information processing apparatus 100-2 changes the use dictionary (step S542).
- the information processing apparatus 100-2 uses a set of correspondences between speech information obtained by speech input and processing based on the speech information, which is used in speech recognition processing. Change of at least a part of the correspondence relationship is controlled based on the subject information of the operation using voice input. For this reason, as described above, the contents of the use dictionary can be appropriately replaced. In particular, in a voice input operation, a user who emits a voice to be input for a voice recognition process has a great influence on the voice recognition process.
- the subject information of the operation includes information for specifying the subject's mode. For this reason, the contents of the usage dictionary can be optimized based on the mode of the user who performs the voice input operation. Therefore, the input voice can be easily recognized correctly, and the recognition performance can be effectively improved.
- the aspect of the subject of the operation includes the action, posture or position of the subject of the operation. For this reason, it is possible to prepare a use dictionary having a correspondence relationship related to a voice that is desired to be voice-recognized in the recognized user action, posture, or position. Therefore, the voice is easily recognized as intended by the user, and the voice input operation can be facilitated.
- the subject information of the operation includes information for specifying the surrounding environment of the subject of the operation. For this reason, it is possible to prepare a use dictionary having a correspondence relation related to a voice that is desired to be recognized in the surrounding environment of the recognized user. Therefore, the voice is easily recognized as intended by the user, and the voice input operation can be facilitated.
- the subject information of the voice input operation may be information specifying the subject of the voice input operation.
- the subject recognition unit 120 identifies the subject of the voice input operation based on information provided from the voice input unit 102.
- the control unit 108 changes the use dictionary to a dictionary corresponding to the subject specified by the subject recognition unit 120.
- the subject recognition unit 120 identifies a voice speaker related to the voice information based on the voice information provided from the voice input unit 102.
- voiceprint analysis technology or the like may be used to specify the speaker.
- the control unit 108 acquires a dictionary or a correspondence set corresponding to the speaker identified by the subject recognition unit 120 from the storage unit 106 or the like.
- the control unit 108 changes the use dictionary to the acquired dictionary, or replaces a part of the use dictionary with the acquired correspondence set.
- the usage dictionary is changed to a dictionary in which the father's voice is easily recognized, and the mother
- the usage dictionary is changed to a dictionary in which mother's voice is easily recognized.
- the use dictionary may be changed so that the correspondence set for the father is not included in the mother's dictionary.
- the usage dictionary may be changed for the operation target possessed by the subject of the voice input operation.
- the control unit 108 may change the use dictionary for the external device 10 or application for which the specified speaker, that is, the user is the owner.
- a dictionary or correspondence set corresponding to the speaker may be set in advance.
- a dictionary or a correspondence set may be set in advance by the user.
- the setting of the dictionary or the correspondence set may be changed afterwards.
- the setting of the dictionary or the correspondence group may be automatically performed. For example, by performing machine learning on the use dictionary and the voice recognition result for each user, a setting of a dictionary or correspondence set that is frequently used for each user may be generated.
- the subject information of the voice input operation may be information specifying the subject attribute of the voice input operation.
- the control unit 108 changes the use dictionary to a dictionary that is specified and corresponds to the attribute of the subject.
- the attributes of the subject include age, gender, skeleton, race, address, or birthplace.
- the usage dictionary is changed to a dictionary that includes a correspondence relationship related to words according to a common way of speaking in the corresponding age group.
- the usage dictionary is changed to a dictionary including a correspondence relationship related to a word corresponding to the dialect of the corresponding region.
- FIG. 11 is a flowchart conceptually showing an example of the overall processing of the information processing system according to the modification of the present embodiment.
- the information processing apparatus 100-2 determines whether voice is input (step S602). If it is determined that the voice is input, the information processing apparatus 100-2 acquires subject information based on the input voice (step S604). Specifically, when voice information is provided by the voice input unit 102, the subject recognition unit 120 determines the subject or the attribute of the subject based on the voice information.
- the information processing apparatus 100-2 determines whether the subject information has changed (step S606). Specifically, when information that identifies the subject or the subject's attribute is provided by the judgment of the subject recognition unit 120, the control unit 108 judges whether the subject or the subject's attribute has changed based on the information. .
- the information processing apparatus 100-2 changes the usage dictionary based on the subject information (step S608). Specifically, when it is determined that the subject or the subject attribute has changed, the control unit 108 changes the usage dictionary for the subject or the subject attribute after the change. Details will be described later.
- the information processing apparatus 100-2 notifies the change of the usage dictionary (step S610), and when a voice is input (step S612), the speech recognition process is executed based on the changed usage dictionary (step S612). S614). Then, the information processing apparatus 100-2 executes subsequent processing according to the recognition result (step S616).
- FIG. 12 is a flowchart conceptually showing an example of dictionary change processing of the information processing system according to the modification of the present embodiment.
- the information processing apparatus 100-2 determines whether the user attribute has changed (step S622), and if it is determined that the user attribute has changed, obtains a dictionary corresponding to the user attribute after the change (step S622). S624). Specifically, when the control unit 108 determines that the user attribute recognized by the subject recognition unit 120 has changed to an attribute different from the previously recognized user attribute, the control unit 108 A dictionary corresponding to the user attribute is acquired from the storage unit 106 or the like.
- the information processing apparatus 100-2 determines whether the user has changed (step S626), and if it is determined that the user has changed, obtains a dictionary corresponding to the user after the change (step S628). Specifically, when the control unit 108 determines that the user recognized by the subject recognition unit 120 has changed to a different attribute from the previously recognized user, the control unit 108 responds to the changed user.
- a dictionary is acquired from the storage unit 106 or the like.
- the information processing apparatus 100-2 changes the usage dictionary (step S630). Specifically, the control unit 108 changes the use dictionary to the acquired dictionary.
- the subject information of the voice input operation includes information specifying the subject of the voice input operation or the attribute of the subject. For this reason, a use dictionary suitable for the subject of the voice input operation can be prepared. Therefore, the input voice can be easily recognized correctly, and the recognition performance can be effectively improved. Furthermore, when a usage dictionary corresponding to the individual user is prepared, it is possible to improve the user's usability or operational feeling.
- a subject or a subject attribute may be specified based on audio information.
- a subject or subject attribute may be specified based on image information.
- individual users or user attributes may be specified using face recognition technology or the like.
- FIG. 13 is an explanatory diagram illustrating a hardware configuration of the information processing apparatus 100 according to an embodiment of the present disclosure.
- the information processing apparatus 100 includes a processor 132, a memory 134, a bridge 136, a bus 138, an interface 140, an input device 142, an output device 144, a measuring device 146, a drive 148, a connection port 150, and a communication device. 152.
- the processor 132 functions as an arithmetic processing unit, and realizes the functions of the voice recognition unit 104, the control unit 108, and the subject recognition unit 120 in the information processing apparatus 100 in cooperation with various programs.
- the processor 132 operates various logical functions of the information processing apparatus 100 by executing a program stored in the memory 134 or another storage medium using the control circuit.
- the processor 132 may be a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), or a system-on-a-chip (SoC).
- the memory 134 stores a program used by the processor 132 or an operation parameter.
- the memory 134 includes a RAM (Random Access Memory), and temporarily stores a program used in the execution of the processor 132 or a parameter that changes as appropriate in the execution.
- the memory 134 includes a ROM (Read Only Memory), and the function of the storage unit is realized by the RAM and the ROM. Note that an external storage device may be used as part of the memory 134 via the connection port 150 or the communication device 152.
- processor 132 and the memory 134 are connected to each other by an internal bus including a CPU bus or the like.
- the bridge 136 connects the buses. Specifically, the bridge 136 connects an internal bus to which the processor 132 and the memory 134 are connected and a bus 138 to be connected to the interface 140.
- the input device 142 is used by a user to operate the information processing apparatus 100 or input information to the information processing apparatus 100, and realizes the function of the voice input unit 102.
- the input device 142 includes input means for a user to input information, an input control circuit that generates an input signal based on an input by the user, and outputs the input signal to the processor 132.
- the input means may be a mouse, keyboard, touch panel, switch, lever, microphone, or the like.
- a user of the information processing apparatus 100 can input various data or instruct a processing operation to the information processing apparatus 100 by operating the input device 142.
- the output device 144 is used to notify the user of information, and realizes the function of the input / output unit.
- the output device 144 may be a display device or a sound output device.
- the output device 144 may be a liquid crystal display (LCD) device, an OLED (Organic Light Emitting Diode) device, a projector, a speaker, a headphone, or the like, or a module that performs output to the device.
- LCD liquid crystal display
- OLED Organic Light Emitting Diode
- the input device 142 or the output device 144 may include an input / output device.
- the input / output device may be a touch screen.
- the measuring device 146 performs measurement on the information processing device 100 and a phenomenon that occurs in the vicinity of the information processing device 100, and realizes the operation of the observation unit 122 of the information processing device 100.
- the measurement device 146 may be an inertial sensor such as an acceleration sensor or an angular velocity sensor, a GPS sensor, or an imaging sensor.
- the measuring device 146 may include an environmental sensor that measures air temperature, humidity, atmospheric pressure, or the like, or a biological sensor that measures body temperature, pulse, sweat, or the like, or may include a plurality of types of sensors.
- the drive 148 is a storage medium reader / writer, and is built in or externally attached to the information processing apparatus 100.
- the drive 148 reads information stored in a mounted removable storage medium such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and outputs the information to the memory 134.
- the drive 148 can also write information on a removable storage medium.
- connection port 150 is a port for directly connecting a device to the information processing apparatus 100.
- the connection port 150 may be a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface) port, or the like.
- the connection port 150 may be an RS-232C port, an optical audio terminal, an HDMI (registered trademark) (High-Definition Multimedia Interface) port, or the like. Data may be exchanged between the information processing apparatus 100 and the external device by connecting the external device to the connection port 150.
- the communication device 152 mediates communication between the information processing device 100 and the external device, and realizes the function of the communication unit 110. Specifically, the communication device 152 executes communication according to a wireless communication method or a wired communication method. For example, the communication device 152 performs wireless communication according to a cellular communication method such as WCDMA (registered trademark) (Wideband Code Division Multiple Access), WiMAX (registered trademark), LTE (Long Term Evolution), or LTE-A.
- WCDMA registered trademark
- WiMAX registered trademark
- LTE Long Term Evolution
- LTE-A Long Term Evolution
- the communication device 152 may be a short-range wireless communication method such as Bluetooth (registered trademark), NFC (Near Field Communication), wireless USB or TransferJet (registered trademark), or a wireless LAN (Local trademark) such as Wi-Fi (registered trademark).
- Wireless communication may be executed according to an arbitrary wireless communication method such as an area network method.
- the communication device 152 may execute wired communication such as signal line communication or wired LAN communication.
- the information processing apparatus 100 may not have a part of the configuration described with reference to FIG. 13 or may have any additional configuration.
- a one-chip information processing module in which all or part of the configuration described with reference to FIG. 13 is integrated may be provided.
- the contents of a use dictionary can be changed appropriately. Therefore, it is possible to prevent malfunction due to voice recognition in daily conversation without providing an activation word.
- the recognition rate can be improved without increasing the size of the dictionary used. Thereby, an increase in misrecognition and a prolonged processing time can be suppressed. Accordingly, it is possible to achieve both improvement in recognition performance and reduction in processing time in the voice recognition processing.
- the recognition rate can be improved without executing a plurality of voice recognition processes. Thereby, an increase in manufacturing cost and processing load can be suppressed.
- the contents of the use dictionary can be appropriately replaced.
- a user who emits a voice to be input for a voice recognition process has a great influence on the voice recognition process. Accordingly, by changing the contents of the dictionary used based on such user information, it is possible to effectively realize malfunction of speech recognition, improvement of recognition rate, and suppression of erroneous recognition and prolonged processing time. Can do. That is, it is possible to achieve both improvement in recognition performance and reduction in processing time in the voice recognition processing.
- the information processing system that is, the information processing apparatus 100 and the server 200 performs the processing
- the information processing apparatus 100 may perform processing alone.
- the information processing apparatus 100 includes a plurality of memories having different access speeds and storage capacities, and the processing using the information processing apparatus 100 and the server 200 as described above may be realized using the plurality of memories.
- the information processing apparatus 100 includes a first memory and a second memory.
- the access speed is faster than that of the second memory, but the storage capacity is smaller than that of the second memory.
- the second memory has an access speed slower than that of the first memory but a storage capacity larger than that of the first memory.
- the information processing apparatus 100 uses a dictionary stored in the first memory as a use dictionary, and if voice recognition fails in the dictionary stored in the first memory, the dictionary stored in the second memory Is used as a dictionary.
- the dictionaries stored in the first memory and the second memory are optimized by the dictionary changing process as described above. As described above, by using a plurality of memories having different access speeds and storage capacities, it is possible to achieve both speeding up of the response to processing for voice input and maintaining or improving the success rate of voice recognition. In particular, the configuration as described above is significant when the information processing apparatus 100 performs processing alone.
- the use dictionary may be changed for the combination of the object information and the subject information.
- time information may be used to change the usage dictionary.
- the time information includes hour and minute, date, day of the week, day and night, or season.
- the information processing apparatus 100 may change the use dictionary to a dictionary corresponding to a combination of an active application and a time zone or a dictionary including a correspondence relationship corresponding to the combination.
- the use dictionary is changed according to a more detailed situation, so that the voice can be more easily recognized correctly. Therefore, the recognition performance can be further improved.
- the correspondence relationship may be simply extracted from the usage dictionary.
- the information processing apparatus 100 extracts a correspondence relationship having a relatively low use frequency from the use dictionary.
- the number of correspondence relationships in the use dictionary is reduced, so that the possibility of erroneous recognition can be reduced.
- the processing time that is, the response can be shortened.
- the use dictionary may not be changed.
- the information processing apparatus 100 may stop changing the usage dictionary.
- the object information of the operation using the voice input or the operation is used to change the correspondence of at least a part of the correspondence set of the voice information and the processing based on the voice information used in the voice recognition process.
- a control unit that controls based on the subject information of An information processing apparatus comprising: (2)
- the correspondence relationship related to the change is determined based on usage information about the correspondence relationship in the voice recognition process for the operation, which is estimated from the object information of the operation or the subject information of the operation. including, The information processing apparatus according to (1).
- the usage information includes information for specifying a usage frequency.
- the usage information includes information for specifying whether or not the usage can be performed.
- the control unit further controls change of the correspondence set based on object information of the operation or subject information of the operation.
- the change of the correspondence set includes a change to the correspondence set having a different set size.
- the correspondence relationship is changed through communication.
- the object information of the operation includes information for specifying an operation target or an attribute of the operation target.
- the operation target includes an application or a device, The information processing apparatus according to (8).
- the control unit further controls the change in the correspondence relationship based on whether the information processing apparatus can communicate.
- the operation subject information includes information that identifies an aspect of the operation subject.
- (12) The aspect of the subject of the operation includes the action, posture or position of the subject of the operation.
- the operation subject information includes information identifying the surrounding environment of the operation subject.
- the operation subject information includes information for specifying an operation subject or an attribute of the operation subject.
- the object information of the operation or the subject information of the operation includes information estimated based on information acquired about the object or subject of the operation.
- the object information of the operation or the subject information of the operation includes information obtained by the voice recognition process.
- a notification control unit that controls notification of the change of the correspondence to the subject of the operation;
- the audio information related to the correspondence includes audio information indicating the start of the operation or audio information indicating the content of the operation.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
- Navigation (AREA)
Abstract
Description
1.第1の実施形態(音声入力操作の客体情報に基づく音声認識辞書の変更)
1-1.システムの構成
1-2.システムの処理
1-3.第1の実施形態のまとめ
1-4.変形例
2.第2の実施形態(音声入力操作の主体情報に基づく音声認識辞書の変更)
2-1.システムの構成
2-2.システムの処理
2-3.第2の実施形態のまとめ
2-4.変形例
3.本開示の一実施形態に係る情報処理装置のハードウェア構成
4.むすび
まず、本開示の第1の実施形態について説明する。第1の実施形態では、情報処理システムは、音声入力操作の客体情報に基づいて音声認識辞書の変更を制御する。
図1を参照して、本実施形態に係る情報処理システムの機能構成について説明する。図1は、本開示の第1の実施形態に係る情報処理システムの機能構成の例を概略的に示すブロック図である。
情報処理装置100-1は、音声入力部102、音声認識部104、記憶部106、制御部108および通信部110を備える。
音声入力部102は、取得部として、音声情報を取得する。具体的には、音声入力部102は、情報処理装置100-1の周辺に存在するユーザにより音声が発せられると、発せられた音声について得られる信号に係る音声信号情報を生成する。なお、音声入力部102は、音声信号情報を生成する代わりに、通信を介して外部の音声入力装置で生成された音声信号情報を取得してもよい。
音声認識部104は、音声情報に基づいて音声認識処理を行う。具体的には、音声認識部104は、音声情報と当該音声情報に基づく処理(以下、後続処理とも称する。)との対応関係および音声入力部102から提供される音声情報に基づいて後続処理を決定する。例えば、音声認識部104は、音声入力部102から音声信号情報が提供されると、当該音声信号情報から文字情報を生成する。そして、音声認識部104は、文字情報と後続処理との対応関係の集合(以下、辞書とも称する。)において、生成された文字情報と一致しまたは類似する(以下、マッチする、とも称する。)文字情報の有無を判定する。生成された文字情報とマッチする文字情報が存在すると判定されると、音声認識部104は、マッチした文字情報に対応する後続処理を制御部108に通知する。
記憶部106は、音声認識処理で用いられる情報を記憶する。具体的には、記憶部106は、辞書を記憶する。例えば、記憶部106は、複数の辞書を記憶し、音声認識部104に辞書を提供する。なお、記憶部106は、辞書単位とは別に個々の対応関係を記憶してもよい。
制御部108は、情報処理装置100-1の動作を全体的に制御する。具体的には、制御部108は、音声認識処理を制御する。より具体的には、制御部108は、音声認識処理で用いられる辞書を制御する。
通信部110は、サーバ200および外部機器10と通信する。具体的には、通信部110は、サーバ200へ辞書提供要求ならびに音声認識要求および音声情報を送信し、サーバ200から辞書および音声認識結果を受信する。また、通信部110は、外部機器10へ動作要求および辞書提供要求を送信し、外部機器10から辞書を受信する。例えば、通信部110は、操作対象となり得る外部機器10の各々へ辞書提供要求をブロードキャスト方式で送信し、操作を許可する外部機器10の各々から辞書を受信する。なお、外部機器10についての辞書が情報処理装置100-1の記憶部106に記憶されている場合には、外部機器10への辞書提供要求の送信が行われない。また、外部機器10についての辞書がサーバ200に記憶されている場合には、サーバ200へ辞書提供要求が送信されるかまたはサーバ200に音声認識処理が実行させられる。
サーバ200は、通信部202、制御部204、音声認識部206および記憶部208を備える。
通信部202は、情報処理装置100-1と通信する。具体的には、通信部202は、情報処理装置100-1から辞書提供要求ならびに音声認識要求および音声情報を受信し、情報処理装置100-1へ辞書および音声認識結果を送信する。
制御部204は、サーバ200の動作を全体的に制御する。具体的には、制御部204は、音声認識要求に応じて音声認識処理を制御する。例えば、制御部204は、情報処理装置100-1から音声認識要求が受信されると、当該音声認識要求と共にまたは別個に受信される音声情報に基づく音声認識処理を音声認識部206に実行させる。そして、制御部204は、音声認識部206の音声認識結果を通信部202に情報処理装置100-1へ送信させる。
音声認識部206は、音声情報に基づいて音声認識処理を行う。なお、音声認識部206の音声認識処理は、情報処理装置100-1の音声認識部104の処理と実質的に同一であるため、説明を省略する。
記憶部208は、音声認識処理に用いられる情報を記憶する。具体的には、記憶部208は、辞書および対応関係を記憶する。例えば、記憶部208の記憶する辞書は、情報処理装置100-1よりも、記憶される辞書のサイズが大きくてよく、記憶される辞書の数も多くてよい。
次に、本実施形態に係る情報処理システムの処理について説明する。
まず、図4を参照して、本実施形態に係る情報処理システムの全体処理について説明する。図4は、本実施形態に係る情報処理システムの全体処理の例を概念的に示すフローチャートである。
続いて、図5を参照して、本実施形態に係る情報処理システムの辞書変更処理について説明する。図5は、本実施形態に係る情報処理システムの辞書変更処理の例を概念的に示すフローチャートである。
このように、本開示の第1の実施形態によれば、情報処理装置100-1は、音声認識処理で用いられる、音声入力により得られる音声情報と当該音声情報に基づく処理との対応関係の集合のうちの少なくとも一部の対応関係の変更を、音声入力を用いた操作の客体情報に基づいて制御する。
以上、本開示の第1の実施形態について説明した。なお、本実施形態は、上述の例に限定されない。以下に、本実施形態の変形例について説明する。
まず、図6を参照して、本実施形態の変形例に係る情報処理システムの全体処理について説明する。図6は、本実施形態の変形例に係る情報処理システムの全体処理の例を概念的に示すフローチャートである。
続いて、図7を参照して、本実施形態の変形例に係る情報処理システムの辞書変更処理について説明する。図7は、本実施形態の変形例に係る情報処理システムの辞書変更処理の例を概念的に示すフローチャートである。
以上、本開示の第1の実施形態および変形例について説明した。次に、本開示の第2の実施形態について説明する。第2の実施形態では、情報処理システムは、音声入力操作の主体情報に基づいて音声認識辞書の変更を制御する。
図8を参照して、本実施形態に係る情報処理システムの機能構成について説明する。図8は、本開示の第2の実施形態に係る情報処理システムの機能構成の例を概略的に示すブロック図である。なお、第1の実施形態の機能と実質的に同一の機能については説明を省略する。
情報処理装置100-2は、音声入力部102、音声認識部104、記憶部106、制御部108および通信部110に加えて、主体認識部120および観察部122を備える。
制御部108は、音声入力を用いた操作の主体情報に基づいて、使用辞書の少なくとも一部の変更を制御する。具体的には、制御部108は、音声入力操作の主体情報から推定される、音声入力操作についての音声認識処理おける対応関係についての使用情報、に基づいて決定される対応関係を使用辞書において入れ替える。例えば、制御部108は、音声入力操作の主体情報から推定される、音声認識処理における使用頻度または使用可否に基づいて入れ替え対象の対応関係を決定する。そして、制御部108は、決定された対応関係を入れ替える。
主体認識部120は、音声入力操作の主体についての認識処理を行う。具体的には、主体認識部120は、観察部122から得られる情報に基づいてユーザの行動、姿勢または位置を認識する。例えば、主体認識部120は、観察部122から得られる加速度もしくは角速度などの慣性情報、GPS(Global Positioning System)情報または画像情報に基づいてユーザの行動、姿勢または位置を認識する。なお、観察部122から得られる情報に加えて、通信部110を介して外部装置から得られる情報が用いられてもよい。例えば、外部装置の有するユーザのスケジュール情報が用いられてよい。
観察部122は、音声入力操作の主体についての観察を行う。具体的には、観察部122は、ユーザの動き、姿勢または位置を観察する。例えば、観察部122は、加速度センサもしくは角速度センサなどの慣性センサ、GPSセンサまたは撮像センサを用いてユーザについての慣性情報、位置情報または画像情報を生成する。
次に、本実施形態に係る情報処理システムの処理について説明する。なお、第1の実施形態の処理と実質的に同一である処理については説明を省略する。
まず、図9を参照して、本実施形態に係る情報処理システムの全体処理について説明する。図9は、本実施形態に係る情報処理システムの全体処理の例を概念的に示すフローチャートである。
続いて、図10を参照して、本実施形態に係る情報処理システムの辞書変更処理について説明する。図10は、本実施形態に係る情報処理システムの辞書変更処理の例を概念的に示すフローチャートである。
このように本開示の第2の実施形態によれば、情報処理装置100-2は、音声認識処理で用いられる、音声入力により得られる音声情報と当該音声情報に基づく処理との対応関係の集合のうちの少なくとも一部の対応関係の変更を、音声入力を用いた操作の主体情報に基づいて制御する。このため、上述したように、使用辞書の内容を適切に入れ替えることができる。特に、音声入力操作においては音声認識処理の入力となる音声を発するユーザが音声認識処理へ与える影響は大きい。従って、そのようなユーザの情報に基づいて使用辞書の内容が変更されることにより、音声認識の誤作動、認識率の向上ならびに誤認識および処理時間の長期化の抑制を効果的に実現することができる。すなわち、音声認識処理における認識性能の向上と処理時間の短縮とを両立させることが可能となる。
以上、本開示の第2の実施形態について説明した。なお、本実施形態は、上述の例に限定されない。以下に、本実施形態の変形例について説明する。
まず、図11を参照して、本実施形態の変形例に係る情報処理システムの全体処理について説明する。図11は、本実施形態の変形例に係る情報処理システムの全体処理の例を概念的に示すフローチャートである。
続いて、図12を参照して、本実施形態の変形例に係る情報処理システムの辞書変更処理について説明する。図12は、本実施形態の変形例に係る情報処理システムの辞書変更処理の例を概念的に示すフローチャートである。
以上、本開示の各実施形態に係る情報処理装置100について説明した。上述した情報処理装置100の処理は、ソフトウェアと、以下に説明する情報処理装置100のハードウェアとの協働により実現される。
プロセッサ132は、演算処理装置として機能し、各種プログラムと協働して情報処理装置100内の音声認識部104、制御部108および主体認識部120の機能を実現する。プロセッサ132は、制御回路を用いてメモリ134または他の記憶媒体に記憶されるプログラムを実行することにより、情報処理装置100の様々な論理的機能を動作させる。例えば、プロセッサ132は、CPU(Central Processing Unit)、GPU(Graphics Processing Unit)、DSP(Digital Signal Processor)またはSoC(System-on-a-Chip)であり得る。
メモリ134は、プロセッサ132が使用するプログラムまたは演算パラメタなどを記憶する。例えば、メモリ134は、RAM(Random Access Memory)を含み、プロセッサ132の実行において使用するプログラムまたは実行において適宜変化するパラメタなどを一時記憶する。また、メモリ134は、ROM(Read Only Memory)を含み、RAMおよびROMにより記憶部の機能を実現する。なお、接続ポート150または通信装置152などを介して外部のストレージ装置がメモリ134の一部として利用されてもよい。
ブリッジ136は、バス間を接続する。具体的には、ブリッジ136は、プロセッサ132およびメモリ134が接続される内部バスと、インタフェース140と接続するバス138と、を接続する。
入力装置142は、ユーザが情報処理装置100を操作しまたは情報処理装置100へ情報を入力するために使用され、音声入力部102の機能を実現する。例えば、入力装置142は、ユーザが情報を入力するための入力手段、およびユーザによる入力に基づいて入力信号を生成し、プロセッサ132に出力する入力制御回路などから構成されている。なお、当該入力手段は、マウス、キーボード、タッチパネル、スイッチ、レバーまたはマイクロフォンなどであってもよい。情報処理装置100のユーザは、入力装置142を操作することにより、情報処理装置100に対して各種のデータを入力したり処理動作を指示したりすることができる。
出力装置144は、ユーザに情報を通知するために使用され、入出力部の機能を実現する。出力装置144は、表示装置または音出力装置であってよい。例えば、出力装置144は、液晶ディスプレイ(LCD:Liquid Crystal Display)装置、OLED(Organic Light Emitting Diode)装置、プロジェクタ、スピーカまたはヘッドフォンなどの装置または当該装置への出力を行うモジュールであってよい。
測定装置146は、情報処理装置100および情報処理装置100の周辺において発生する現象についての測定を行い、情報処理装置100の観察部122の動作を実現する。例えば、当該測定装置146は、加速度センサもしくは角速度センサなどの慣性センサ、GPSセンサまたは撮像センサであってよい。なお、測定装置146は、気温、湿度もしくは気圧などを測定する環境センサまたは体温、脈拍もしくは発汗などを測定する生体センサを含んでもよく、複数の種類のセンサが含まれてもよい。
ドライブ148は、記憶媒体用リーダライタであり、情報処理装置100に内蔵、あるいは外付けされる。ドライブ148は、装着されている磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリ等のリムーバブル記憶媒体に記憶されている情報を読み出して、メモリ134に出力する。また、ドライブ148は、リムーバブル記憶媒体に情報を書込むこともできる。
接続ポート150は、機器を情報処理装置100に直接接続するためのポートである。例えば、接続ポート150は、USB(Universal Serial Bus)ポート、IEEE1394ポート、SCSI(Small Computer System Interface)ポートなどであり得る。また、接続ポート150は、RS-232Cポート、光オーディオ端子、HDMI(登録商標)(High-Definition Multimedia Interface)ポートなどであってもよい。接続ポート150に外部機器を接続することで、情報処理装置100と当該外部機器との間でデータが交換されてもよい。
通信装置152は、情報処理装置100と外部装置との間の通信を仲介し、通信部110の機能を実現する。具体的には、通信装置152は、無線通信方式または有線通信方式に従って通信を実行する。例えば、通信装置152は、WCDMA(登録商標)(Wideband Code Division Multiple Access)、WiMAX(登録商標)、LTE(Long Term Evolution)もしくはLTE-Aなどのセルラ通信方式に従って無線通信を実行する。なお、通信装置152は、Bluetooth(登録商標)、NFC(Near Field Communication)、ワイヤレスUSBもしくはTransferJet(登録商標)などの近距離無線通信方式、またはWi-Fi(登録商標)などの無線LAN(Local Area Network)方式といった、任意の無線通信方式に従って無線通信を実行してもよい。また、通信装置152は、信号線通信または有線LAN通信などの有線通信を実行してよい。
以上、本開示の第1の実施形態によれば、使用辞書の内容を適切に入れ替えることができる。そのため、起動ワードを設けることなく、日常の会話における音声の認識による誤作動を防止することができる。また、使用辞書のサイズを大きくすることなく、認識率を向上させることができる。それにより、誤認識の増加および処理時間の長期化も抑制することができる。従って、音声認識処理における認識性能の向上と処理時間の短縮とを両立させることが可能となる。さらに、複数の音声認識処理を実行させることなく、認識率を向上させることができる。それにより、製造コストおよび処理負荷の増加を抑制することができる。
(1)
音声入力により得られる音声情報を得る取得部と、
音声認識処理で用いられる、前記音声情報と前記音声情報に基づく処理との対応関係の集合のうちの少なくとも一部の前記対応関係の変更を、前記音声入力を用いた操作の客体情報または前記操作の主体情報に基づいて制御する制御部と、
を備える情報処理装置。
(2)
変更に係る前記対応関係は、前記操作の客体情報または前記操作の主体情報から推定される、前記操作についての前記音声認識処理における前記対応関係についての使用情報、に基づいて決定される前記対応関係を含む、
前記(1)に記載の情報処理装置。
(3)
前記使用情報は、使用頻度が特定される情報を含む、
前記(2)に記載の情報処理装置。
(4)
前記使用情報は、使用可否が特定される情報を含む、
前記(2)または(3)に記載の情報処理装置。
(5)
前記制御部は、さらに前記操作の客体情報または前記操作の主体情報に基づいて前記対応関係の集合の変更を制御する、
前記(1)~(4)のいずれか1項に記載の情報処理装置。
(6)
前記対応関係の集合の変更は、集合の大きさが異なる前記対応関係の集合へ変更を含む、
前記(5)に記載の情報処理装置。
(7)
前記対応関係は、通信を介して変更される、
前記(1)~(6)のいずれか1項に記載の情報処理装置。
(8)
前記操作の客体情報は、操作対象または前記操作対象の属性が特定される情報を含む、
前記(1)~(7)のいずれか1項に記載の情報処理装置。
(9)
前記操作対象は、アプリケーションまたは機器を含む、
前記(8)に記載の情報処理装置。
(10)
前記制御部は、さらに前記情報処理装置の通信可否に基づいて前記対応関係の変更を制御する、
前記(1)~(9)のいずれか1項に記載の情報処理装置。
(11)
前記操作の主体情報は、前記操作の主体の態様が特定される情報を含む、
前記(1)~(10)のいずれか1項に記載の情報処理装置。
(12)
前記操作の主体の態様は、前記操作の主体の行動、姿勢または位置を含む、
前記(11)に記載の情報処理装置。
(13)
前記操作の主体情報は、前記操作の主体の周辺環境が特定される情報を含む、
前記(1)~(12)のいずれか1項に記載の情報処理装置。
(14)
前記操作の主体情報は、前記操作の主体または前記操作の主体の属性が特定される情報を含む、
前記(1)~(13)のいずれか1項に記載の情報処理装置。
(15)
前記操作の客体情報または前記操作の主体情報は、前記操作の客体または主体について取得される情報に基づいて推定される情報を含む、
前記(1)~(14)のいずれか1項に記載の情報処理装置。
(16)
前記操作の客体情報または前記操作の主体情報は、前記音声認識処理により得られる情報を含む、
前記(1)~(15)のいずれか1項に記載の情報処理装置。
(17)
前記対応関係の変更についての前記操作の主体への通知を制御する通知制御部をさらに備える、
前記(1)~(16)のいずれか1項に記載の情報処理装置。
(18)
前記対応関係に係る前記音声情報は、前記操作の開始を示す音声情報または前記操作の内容を示す音声情報を含む、
前記(1)~(17)のいずれか1項に記載の情報処理装置。
(19)
プロセッサを用いて、
音声入力により得られる音声情報を得ることと、
音声認識処理で用いられる、前記音声情報と前記音声情報に基づく処理との対応関係の集合のうちの少なくとも一部の前記対応関係の変更を、前記音声入力を用いた操作の客体情報または前記操作の主体情報に基づいて制御することと、
を含む情報処理方法。
(20)
音声入力により得られる音声情報を得る取得機能と、
音声認識処理で用いられる、前記音声情報と前記音声情報に基づく処理との対応関係の集合のうちの少なくとも一部の前記対応関係の変更を、前記音声入力を用いた操作の客体情報または前記操作の主体情報に基づいて制御する制御機能と、
をコンピュータに実現させるためのプログラム。
102 音声入力部
104 音声認識部
106 記憶部
108 制御部
110 通信部
120 主体認識部
122 観察部
200 サーバ
Claims (20)
- 音声入力により得られる音声情報を得る取得部と、
音声認識処理で用いられる、前記音声情報と前記音声情報に基づく処理との対応関係の集合のうちの少なくとも一部の前記対応関係の変更を、前記音声入力を用いた操作の客体情報または前記操作の主体情報に基づいて制御する制御部と、
を備える情報処理装置。 - 変更に係る前記対応関係は、前記操作の客体情報または前記操作の主体情報から推定される、前記操作についての前記音声認識処理における前記対応関係についての使用情報、に基づいて決定される前記対応関係を含む、
請求項1に記載の情報処理装置。 - 前記使用情報は、使用頻度が特定される情報を含む、
請求項2に記載の情報処理装置。 - 前記使用情報は、使用可否が特定される情報を含む、
請求項2に記載の情報処理装置。 - 前記制御部は、さらに前記操作の客体情報または前記操作の主体情報に基づいて前記対応関係の集合の変更を制御する、
請求項1に記載の情報処理装置。 - 前記対応関係の集合の変更は、集合の大きさが異なる前記対応関係の集合へ変更を含む、
請求項5に記載の情報処理装置。 - 前記対応関係は、通信を介して変更される、
請求項1に記載の情報処理装置。 - 前記操作の客体情報は、操作対象または前記操作対象の属性が特定される情報を含む、
請求項1に記載の情報処理装置。 - 前記操作対象は、アプリケーションまたは機器を含む、
請求項8に記載の情報処理装置。 - 前記制御部は、さらに前記情報処理装置の通信可否に基づいて前記対応関係の変更を制御する、
請求項1に記載の情報処理装置。 - 前記操作の主体情報は、前記操作の主体の態様が特定される情報を含む、
請求項1に記載の情報処理装置。 - 前記操作の主体の態様は、前記操作の主体の行動、姿勢または位置を含む、
請求項11に記載の情報処理装置。 - 前記操作の主体情報は、前記操作の主体の周辺環境が特定される情報を含む、
請求項1に記載の情報処理装置。 - 前記操作の主体情報は、前記操作の主体または前記操作の主体の属性が特定される情報を含む、
請求項1に記載の情報処理装置。 - 前記操作の客体情報または前記操作の主体情報は、前記操作の客体または主体について取得される情報に基づいて推定される情報を含む、
請求項1に記載の情報処理装置。 - 前記操作の客体情報または前記操作の主体情報は、前記音声認識処理により得られる情報を含む、
請求項1に記載の情報処理装置。 - 前記対応関係の変更についての前記操作の主体への通知を制御する通知制御部をさらに備える、
請求項1に記載の情報処理装置。 - 前記対応関係に係る前記音声情報は、前記操作の開始を示す音声情報または前記操作の内容を示す音声情報を含む、
請求項1に記載の情報処理装置。 - プロセッサを用いて、
音声入力により得られる音声情報を得ることと、
音声認識処理で用いられる、前記音声情報と前記音声情報に基づく処理との対応関係の集合のうちの少なくとも一部の前記対応関係の変更を、前記音声入力を用いた操作の客体情報または前記操作の主体情報に基づいて制御することと、
を含む情報処理方法。 - 音声入力により得られる音声情報を得る取得機能と、
音声認識処理で用いられる、前記音声情報と前記音声情報に基づく処理との対応関係の集合のうちの少なくとも一部の前記対応関係の変更を、前記音声入力を用いた操作の客体情報または前記操作の主体情報に基づいて制御する制御機能と、
をコンピュータに実現させるためのプログラム。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020187026954A KR20180134337A (ko) | 2016-04-11 | 2017-03-06 | 정보 처리 장치, 정보 처리 방법 및 프로그램 |
| EP17782153.5A EP3444808A4 (en) | 2016-04-11 | 2017-03-06 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING PROCESS AND PROGRAM |
| US16/076,223 US20210193133A1 (en) | 2016-04-11 | 2017-03-06 | Information processing device, information processing method, and program |
| JP2018511925A JP6930531B2 (ja) | 2016-04-11 | 2017-03-06 | 情報処理装置、情報処理方法およびプログラム |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2016079005 | 2016-04-11 | ||
| JP2016-079005 | 2016-04-11 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017179335A1 true WO2017179335A1 (ja) | 2017-10-19 |
Family
ID=60041683
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2017/008644 Ceased WO2017179335A1 (ja) | 2016-04-11 | 2017-03-06 | 情報処理装置、情報処理方法およびプログラム |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20210193133A1 (ja) |
| EP (1) | EP3444808A4 (ja) |
| JP (1) | JP6930531B2 (ja) |
| KR (1) | KR20180134337A (ja) |
| WO (1) | WO2017179335A1 (ja) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021516361A (ja) * | 2018-04-04 | 2021-07-01 | アイフライテック カンパニー,リミテッド | 音声ウェイクアップ方法及び装置 |
| JP2021182068A (ja) * | 2020-05-19 | 2021-11-25 | Necパーソナルコンピュータ株式会社 | 映像表示装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004086150A (ja) * | 2002-06-28 | 2004-03-18 | Denso Corp | 音声制御装置 |
| WO2007066433A1 (ja) * | 2005-12-07 | 2007-06-14 | Mitsubishi Electric Corporation | 音声認識装置 |
| JP2008026464A (ja) * | 2006-07-19 | 2008-02-07 | Denso Corp | 車両用音声認識装置 |
| JP2015526753A (ja) * | 2012-06-15 | 2015-09-10 | 本田技研工業株式会社 | 深度に基づく場面認識 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040260438A1 (en) * | 2003-06-17 | 2004-12-23 | Chernetsky Victor V. | Synchronous voice user interface/graphical user interface |
| JP4791699B2 (ja) * | 2004-03-29 | 2011-10-12 | 中国電力株式会社 | 業務支援システム及び方法 |
| US20090204392A1 (en) * | 2006-07-13 | 2009-08-13 | Nec Corporation | Communication terminal having speech recognition function, update support device for speech recognition dictionary thereof, and update method |
| EP2946383B1 (en) * | 2013-03-12 | 2020-02-26 | Nuance Communications, Inc. | Methods and apparatus for detecting a voice command |
| US9582246B2 (en) * | 2014-03-04 | 2017-02-28 | Microsoft Technology Licensing, Llc | Voice-command suggestions based on computer context |
-
2017
- 2017-03-06 WO PCT/JP2017/008644 patent/WO2017179335A1/ja not_active Ceased
- 2017-03-06 EP EP17782153.5A patent/EP3444808A4/en not_active Withdrawn
- 2017-03-06 JP JP2018511925A patent/JP6930531B2/ja active Active
- 2017-03-06 KR KR1020187026954A patent/KR20180134337A/ko not_active Withdrawn
- 2017-03-06 US US16/076,223 patent/US20210193133A1/en not_active Abandoned
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2004086150A (ja) * | 2002-06-28 | 2004-03-18 | Denso Corp | 音声制御装置 |
| WO2007066433A1 (ja) * | 2005-12-07 | 2007-06-14 | Mitsubishi Electric Corporation | 音声認識装置 |
| JP2008026464A (ja) * | 2006-07-19 | 2008-02-07 | Denso Corp | 車両用音声認識装置 |
| JP2015526753A (ja) * | 2012-06-15 | 2015-09-10 | 本田技研工業株式会社 | 深度に基づく場面認識 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3444808A4 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021516361A (ja) * | 2018-04-04 | 2021-07-01 | アイフライテック カンパニー,リミテッド | 音声ウェイクアップ方法及び装置 |
| JP7114721B2 (ja) | 2018-04-04 | 2022-08-08 | アイフライテック カンパニー,リミテッド | 音声ウェイクアップ方法及び装置 |
| JP2021182068A (ja) * | 2020-05-19 | 2021-11-25 | Necパーソナルコンピュータ株式会社 | 映像表示装置 |
| JP7132974B2 (ja) | 2020-05-19 | 2022-09-07 | Necパーソナルコンピュータ株式会社 | 映像表示装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP6930531B2 (ja) | 2021-09-01 |
| KR20180134337A (ko) | 2018-12-18 |
| JPWO2017179335A1 (ja) | 2019-02-14 |
| EP3444808A1 (en) | 2019-02-20 |
| EP3444808A4 (en) | 2019-05-01 |
| US20210193133A1 (en) | 2021-06-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP6348831B2 (ja) | 音声入力補助装置、音声入力補助システムおよび音声入力方法 | |
| US20210134278A1 (en) | Information processing device and information processing method | |
| US10310808B2 (en) | Systems and methods for simultaneously receiving voice instructions on onboard and offboard devices | |
| JP6052610B2 (ja) | 情報通信端末、およびその対話方法 | |
| JP2007017731A (ja) | 音声認識装置、音声認識装置を備えたナビゲーション装置及び音声認識装置の音声認識方法 | |
| TW200847004A (en) | Speech-centric multimodal user interface design in mobile technology | |
| KR20210040856A (ko) | 스마트 백미러의 인터랙션 방법, 장치, 전자기기와 저장매체 | |
| US20170017497A1 (en) | User interface system, user interface control device, user interface control method, and user interface control program | |
| US9715878B2 (en) | Systems and methods for result arbitration in spoken dialog systems | |
| JP2020162003A (ja) | エージェント装置、エージェント装置の制御方法、およびプログラム | |
| JP7239366B2 (ja) | エージェント装置、エージェント装置の制御方法、およびプログラム | |
| JPWO2016174955A1 (ja) | 情報処理装置、及び、情報処理方法 | |
| JP6930531B2 (ja) | 情報処理装置、情報処理方法およびプログラム | |
| JP2020160135A (ja) | エージェント装置、エージェント装置の制御方法、およびプログラム | |
| US20160109944A1 (en) | Information acquisition method, information acquisition system, and non-transitory recording medium for user of motor vehicle | |
| US10475470B2 (en) | Processing result error detection device, processing result error detection program, processing result error detection method, and moving entity | |
| JP2025142027A (ja) | サーバ装置及び情報処理方法並びにサーバプログラム | |
| JP2020144264A (ja) | エージェント装置、エージェント装置の制御方法、およびプログラム | |
| JP2020160133A (ja) | エージェントシステム、エージェントシステムの制御方法、およびプログラム | |
| JP2020144260A (ja) | 車載エージェントシステム、車載エージェントシステムの制御方法、およびプログラム | |
| US11608076B2 (en) | Agent device, and method for controlling agent device | |
| JP7175221B2 (ja) | エージェント装置、エージェント装置の制御方法、およびプログラム | |
| JPWO2010073406A1 (ja) | 情報提供装置、通信端末、情報提供システム、情報提供方法、情報出力方法、情報提供プログラム、情報出力プログラムおよび記録媒体 | |
| JP2021022046A (ja) | 制御装置、制御方法、及びプログラム | |
| US11797261B2 (en) | On-vehicle device, method of controlling on-vehicle device, and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 2018511925 Country of ref document: JP |
|
| ENP | Entry into the national phase |
Ref document number: 20187026954 Country of ref document: KR Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2017782153 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2017782153 Country of ref document: EP Effective date: 20181112 |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17782153 Country of ref document: EP Kind code of ref document: A1 |