CN115243062B - Scene display method and device, screen display device, electronic device and storage medium - Google Patents
Scene display method and device, screen display device, electronic device and storage medium Download PDFInfo
- Publication number
- CN115243062B CN115243062B CN202210682504.0A CN202210682504A CN115243062B CN 115243062 B CN115243062 B CN 115243062B CN 202210682504 A CN202210682504 A CN 202210682504A CN 115243062 B CN115243062 B CN 115243062B
- Authority
- CN
- China
- Prior art keywords
- image
- text
- display
- display object
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 73
- 238000013507 mapping Methods 0.000 claims abstract description 58
- 238000001514 detection method Methods 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 17
- 238000012015 optical character recognition Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 230000003287 optical effect Effects 0.000 claims description 2
- 239000002453 shampoo Substances 0.000 description 15
- 238000012552 review Methods 0.000 description 7
- 230000009471 action Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 230000003068 static effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/472—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
- H04N21/4722—End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The application provides a scene display method, a device, screen display equipment, electronic equipment and a storage medium, wherein the method comprises the steps of detecting an image display instruction from target voice; the image display instruction is used for triggering display of the image matched with the specific display object; if an image display instruction is detected from the target voice, determining a target image corresponding to the image display instruction from a feature mapping library according to the detected image display instruction and a preset feature mapping library; the feature mapping library stores images and matching relations between the images and the display objects; and displaying the target image. According to the scheme, automatic switching of the display images based on voice is realized, and convenience of image switching is improved. The video live broadcasting based product sales method and device can be applied to product sales based on video live broadcasting, automatic switching of product posters can be achieved, manual switching by workers is not needed, and convenience of poster switching is improved.
Description
Technical Field
The present application relates to the field of scene switching technologies, and in particular, to a scene display method, a device, a screen display device, an electronic device, and a storage medium.
Background
In order to facilitate explanation of the theme when introducing the theme to the service object, people generally describe the theme in combination with images, and when introducing other themes, the user needs to switch to images corresponding to other themes. For example, when selling a product based on live video, in order to more clearly introduce the product, the poster of the product is correspondingly displayed on the live video display interface, and if other products are to be introduced, the poster is correspondingly switched. However, at present, the image switching is generally required to be performed manually, the steps are complex, and the operation is inconvenient.
Disclosure of Invention
Based on the above requirements, the application provides a scene display method, a device, a screen display device, an electronic device and a storage medium.
The technical scheme provided by the application is as follows:
in one aspect, the present application provides a scene display method, including:
detecting an image display instruction from target voice; the image display instruction is used for triggering display of an image matched with a specific display object;
If an image display instruction is detected from the target voice, determining a target image corresponding to the image display instruction from a feature mapping library according to the detected image display instruction and the preset feature mapping library; the feature mapping library stores images and matching relations between the images and display objects;
and displaying the target image.
Further, in the method described above, the determining, according to the detected image display instruction, the target image corresponding to the image display instruction includes:
Determining a target image corresponding to the image display instruction from a feature mapping library according to the detected image display instruction and the preset feature mapping library; and storing the image and the matching relation between the image and the display object in the feature mapping library.
Further, in the method described above, the detecting the image display instruction from the target voice includes:
Performing voice recognition on the target voice to obtain a recognition text;
And detecting an image display instruction from the identification text.
Further, in the method described above, the detecting an image display instruction from the recognition text includes:
Detecting whether the identification text comprises a preset instruction text or not;
if the identification text comprises a preset instruction text, extracting a target text segment with a set length from a set position in the identification text according to the instruction text in the identification text; wherein the set position is determined according to the position of the instruction text;
detecting whether the target text segment contains a display object name or not;
And if the target text segment contains the display object name, determining the instruction text and the display object name as an image display instruction.
Further, in the above method, the feature map library is constructed by the following method:
Identifying text content of the image;
Detecting whether the text content of the image comprises a display object name or not;
And if the text content of the image comprises the display object name, storing the image and the matching relation between the image and the display object name contained in the image in the feature mapping library.
Further, in the above method, the detecting whether the text content of the image includes a display object name includes:
Starting from the ith character of the text content of the image, sequentially judging whether a character string formed by the first N characters of the target character sequence is a display object name or not, wherein i and N are positive integers; the target character sequence is composed of all characters starting from the ith character of the text content of the image;
If the character string formed by the first N characters of the target character sequence is a display object name, determining that the text content of the image contains the display object name, recording the display object name, and forming the rest characters starting from the (n+1) th character of the target character sequence into a new target character sequence;
repeating the above process until N is equal to the total character number of the target character sequence or N is equal to a preset value;
let i=i+1 and re-execute the above processing until i+1 is greater than the total number of characters of the text content in the image.
Further, in the above method, the sequentially determining whether the character string composed of the first N characters of the target character sequence is the display object name includes:
Detecting whether a preset display object name database comprises display object names matched with character strings formed by the first N characters of a target character sequence or not;
And if the display object name database comprises the display object name matched with the character string, determining the character string as the display object name.
Further, in the above method, the detecting whether the text content of the image includes a display object name includes:
searching whether the text content comprises the display object names in the preset display object name database or not by taking the display object names in the preset display object name database as search conditions;
if the text content comprises the display object names in the preset display object name database, determining that the text content of the image comprises the display object names, and recording the display object names.
Further, in the method described above, the displaying the target image includes:
And switching the display content of the set image display area to the target image.
In another aspect, the present application further provides a scene display device, including:
The detection module is used for detecting an image display instruction from the target voice; the image display instruction is used for triggering display of an image matched with a specific display object;
The determining module is used for determining a target image corresponding to the image display instruction from the feature mapping library according to the detected image display instruction and the preset feature mapping library if the image display instruction is detected from the target voice; the feature mapping library stores images and matching relations between the images and display objects;
And the display module displays the target image.
In another aspect, the present application further provides a display device, including:
The device comprises a voice recognition component, a first processor connected with the voice recognition component, and a display screen connected with the first processor;
The voice recognition component is used for carrying out voice recognition on the collected user voice to obtain a recognition text;
The first processor is used for detecting an image display instruction from the recognition text output by the voice recognition component, wherein the image display instruction is used for triggering display of an image matched with a specific display object; if an image display instruction is detected from the target voice, determining a target image corresponding to the image display instruction according to the detected image display instruction, and sending the target image to the display screen;
and the display screen decodes and displays the target image sent by the first processor.
Further, the screen display device described above further includes:
An optical character recognition unit and a first memory connected to the first processor, respectively;
the optical character recognition part is used for recognizing text content from an input image and sending the recognized text content to the first processor;
The first processor is further configured to: detecting whether the text content obtained by the recognition of the optical character recognition part comprises a display object name or not; if the text content obtained by the identification of the optical character identification component comprises a display object name, storing an input image and a matching relation between the image and the display object name contained in the image into the first memory;
the first memory stores the image sent by the first processor and the matching relation between the image and the name of the display object contained in the image.
In another aspect, the present application further provides an electronic device, including:
A second memory and a second processor;
Wherein the second memory is used for storing programs;
The second processor is configured to implement the scene display method by running the program in the second memory.
In another aspect, the present application further provides a storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the steps of the scene showing method described in any one of the above.
The scene display method, the device, the screen display equipment, the electronic equipment and the storage medium comprise the steps of detecting an image display instruction from target voice; the image display instruction is used for triggering display of the image matched with the specific display object; if an image display instruction is detected from the target voice, determining a target image corresponding to the image display instruction from a feature mapping library according to the detected image display instruction and a preset feature mapping library; the feature mapping library stores images and matching relations between the images and the display objects; and displaying the target image. According to the scheme, automatic switching of the display images based on voice is realized, and convenience of image switching is improved.
The video live broadcasting based product sales method and device can be applied to product sales based on video live broadcasting, automatic switching of product posters can be achieved, manual switching by workers is not needed, and convenience of poster switching is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a scene showing method according to an embodiment of the present application;
Fig. 2 is a schematic flow chart of a detection image display instruction according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a feature mapping library construction flow provided in an embodiment of the present application;
fig. 4 is a schematic flow chart of detecting names of display objects according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating a method for determining whether a string is a name of a presentation object according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating another exemplary method for detecting a display object name according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a scene display device according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a display device according to an embodiment of the present application;
FIG. 9 is a schematic diagram of another display device according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical scheme of the embodiment of the application is suitable for the application scene of automatic switching of the display images, and the automatic switching of the display images can be carried out according to the voice, so that the convenience of image switching is improved.
The technical scheme of the embodiment of the application can be applied to hardware equipment such as a hardware processor or packaged into a software program to be operated, and when the hardware processor executes the processing procedure of the technical scheme of the embodiment of the application or the software program is operated, the automatic switching of the display image can be realized. The embodiment of the application only describes the specific processing procedure of the technical scheme of the application by way of example, and does not limit the specific implementation form of the technical scheme of the application, and any technical implementation form capable of executing the processing procedure of the technical scheme of the application can be adopted by the embodiment of the application.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment provides a scene display method, referring to fig. 1, which includes:
s101, detecting an image display instruction from target voice.
The target voice refers to voice content when the subject introduction is performed on the service-oriented object, for example, voice content when the video anchor performs product sales and introduces the product, and the target voice can be used as the target voice. The embodiment of the application is not limited to the language of the target voice, as long as the target voice can be recognized and converted based on the voice recognition technology. In this embodiment, the target voice may be first subjected to voice recognition to obtain a converted recognition text, and the image display instruction is recognized based on the recognition text.
The image display instruction is used for triggering display of the image matched with the specific display object. When an image display instruction is detected from the target voice, the currently displayed image is switched to an image matched with a specific display object.
The image presentation instructions include at least instruction text and a specific presentation object. The instruction text comprises action instructions which can be used for representing that the introduction of a current theme is ended and the introduction of a next theme is started when the theme introduction is carried out by the service-oriented object. For example, words such as "open", "review", "compare", "we review", "what is about to be introduced next", etc., those skilled in the art may set the action command according to the actual application scenario, and this embodiment is not limited. The specific display object refers to the next theme, such as a mobile phone of a certain model, a washing and caring product of a certain model, and the like, which is introduced after the introduction of the current theme is ended.
The positional relationship between the instruction text and the presentation object is relatively simple, with the presentation object typically being located either before or after the instruction text, adjacent to the instruction text or separated by a relatively small number of characters. Therefore, the instruction text has higher directivity and can indicate the position of the display object. It should be noted that, the instruction text and the position of the display object indicated by each instruction text can be set in advance by a person skilled in the art according to the actual application scenario, and the embodiment is not limited. In this embodiment, word-by-word recognition is performed on the recognition text, whether the instruction text is detected is determined, if the instruction text is detected, a character string which possibly includes the display object name in the recognition text is determined according to the directivity of the instruction text, and then the display object name is recognized from the character string.
For example, in a scenario where a video anchor is on-line selling a product, the video anchor's voice during the product introduction may be obtained, and an image presentation instruction may be detected from the voice.
When the video anchor finishes the introduction of the shampoo of the type A and starts the introduction of the shampoo of the type B, the voice of the video anchor possibly comprises ' the shampoo of the type A finishes the introduction of the next very good-use shampoo of the type B ', from which the embodiment of the application can detect that the instruction text comprises ' the next to be introduced ', and according to the preset instruction text ' the next to be introduced is the position of the indicated display object ', the detection is carried out from the instruction text ' the next to be introduced is the ' the very good-use shampoo of the type B ', the name ' the type B ' of the display object is obtained, and the image matched with the ' the type B ' is triggered to be displayed after the image display instruction corresponding to the ' the type B ' is acquired.
S102, if an image display instruction is detected from the target voice, determining a target image corresponding to the image display instruction from a feature mapping library according to the detected image display instruction and the preset feature mapping library.
The target image refers to an image for displaying a specific display object in an image display instruction, including but not limited to displaying a poster, displaying a video, and the like. For example, the specific display object in the image display instruction is "type B shampoo", and the target image is a display poster or display video of "type B shampoo".
In the embodiment of the application, a feature mapping library is preset. The feature mapping library stores images and the matching relation between the images and the display objects so as to determine the target images matched with the image display instructions from the feature mapping library after the image display instructions are detected from the target voice, namely, the target images matched with the display objects in the image display instructions are determined from the feature mapping library.
Illustratively, in a scenario where the video anchor is conducting an online product sale, the image is a product poster. If the video anchor needs to switch three posters, the three posters are respectively a first poster, a second poster and a third poster.
If the display object corresponding to the first poster comprises an A1 type mobile phone, an A2 type mobile phone and an A3 type mobile phone, the display object corresponding to the second poster comprises a B1 type tablet personal computer and a B2 type tablet personal computer, and the display object corresponding to the third poster comprises a C1 type shampoo and a C2 type conditioner.
Then, the feature map library stores: the mobile phone comprises a first poster, a matching relation between an A1 type mobile phone and the first poster, a matching relation between an A2 type mobile phone and the first poster, and a matching relation between an A3 type mobile phone and the first poster; the second poster, and the matching relation between the B1 type tablet personal computer and the second poster and the matching relation between the B2 type tablet personal computer and the second poster; the third poster, and the matching relation of the shampoo of the model C1 and the third poster, and the matching relation of the hair conditioner of the model C2 and the third poster.
If an image display instruction is detected from the voice of the video anchor and the image display instruction comprises 'opening' and 'A1 mobile phone', determining an image corresponding to 'A1 mobile phone' from a feature mapping library as a first poster; if the image display instruction comprises 'opening' and 'A3 mobile phone', determining that an image corresponding to the 'A3 mobile phone' is still a first poster from a feature mapping library; if the image display instruction includes "we look down again" and "C2 conditioner", then the image corresponding to "C2 conditioner" is determined from the feature mapping library to be the third poster.
In the above embodiment, a feature mapping library is preset, where images and matching relations between the images and display objects are stored in the feature mapping library, so that according to an image display instruction detected from target voice, a target image corresponding to the image display instruction is automatically determined from the feature mapping library, without manual searching, and convenience in image switching is improved.
S103, displaying the target image.
In the embodiment of the application, after the target image corresponding to the image display instruction is determined according to the steps, the target image can be displayed, and the position for specifically displaying the target image can be determined according to the display environment.
For example, if the service-oriented theme introduction is performed by live video, the target image may be displayed from a display area of the live video. For example, in a scene that a video anchor is selling on-line products, the target image is a product poster, and the product poster can be directly displayed from a live broadcast display area; in the scene that the teacher gives lessons online, the target image is a teaching courseware, and the teaching courseware can be directly displayed from the live display area.
If the introduction of the service-oriented theme is performed on-line, for convenience of description of the theme, a person will describe the theme in conjunction with an image displayed on a display device, and in such a scenario, a target image may be displayed from the display device.
The scene showing method of the above embodiment includes detecting an image showing instruction from a target voice, where the image showing instruction is used to trigger showing an image matched with a specific showing object, and if the image showing instruction is detected from the target voice, determining a target image corresponding to the image showing instruction according to the detected image showing instruction, and showing the target image. According to the scheme, automatic switching of the display images based on voice is realized, and convenience of image switching is improved.
The video live broadcasting based product sales method and device can be applied to product sales based on video live broadcasting, automatic switching of product posters can be achieved, manual switching by workers is not needed, and convenience of poster switching is improved.
In another embodiment of the present application, the steps of the above embodiment detect an image display instruction from a target voice, and specifically may be implemented as follows:
Performing voice recognition on the target voice to obtain a recognition text; image presentation instructions are detected from the identified text.
Specifically, in the embodiment of the application, the voice content is obtained when the theme introduction is performed by facing the service object. In order to facilitate understanding of the real semantics in the target voice, a correct image display instruction is detected from the target voice, and voice recognition can be performed on the target voice first, so that the target voice is converted into a recognition text, and the image display instruction is detected from the recognition text.
In this embodiment, the target voice can be identified and converted in real time, so that a target image corresponding to the image display instruction can be displayed in time. It should be noted that, the recognition of the voice and the conversion of the voice into text are very mature existing techniques, and those skilled in the art may use the existing techniques to perform the voice recognition of the target voice, which is not described in detail in this embodiment.
In the above embodiment, the target voice is subjected to voice recognition to obtain the recognition text, and the image display instruction is extracted from the recognition text, so that the true semantics in the target voice can be understood conveniently, and the correct image display instruction is detected from the target voice.
As shown in fig. 2, in another embodiment of the present application, the steps of the above embodiment detect an image display instruction from a recognition text, which may be specifically implemented by:
s201, detecting whether the identification text comprises a preset instruction text.
In this embodiment, the instruction text includes an action instruction that can be used to characterize the end of the introduction of the current theme and start the introduction of the next theme when the theme is introduced by the service-oriented object. Meanwhile, the display object is generally positioned in front of or behind the instruction text, and is adjacent to or separated by a small number of characters, so that the instruction text has high directivity and can indicate the position of the display object. It should be noted that, the instruction text and the position of the display object indicated by each instruction text can be set in advance by a person skilled in the art according to the actual application scenario, and the embodiment is not limited.
In this embodiment, voice recognition is performed on the target voice in advance, and a recognized text after recognition is obtained. So that detection is performed on the basis of the recognition text to determine whether the recognition text includes a preset instruction text.
Specifically, the target voice can be recognized and converted in real time, and the recognition text is detected in real time after the recognition text is generated, so that the target image corresponding to the image display instruction can be displayed in time.
Illustratively, the present embodiment, after acquiring the recognition text, detects the characters one by one according to the generation order of the characters in the recognition text, so as to determine whether the instruction text is detected from the recognition text. Specifically, it may be determined whether the currently generated character hits the first character of the instruction text, if the currently generated character does not hit the first character of the target instruction text, then detection of the subsequently generated character continues, if the currently generated character hits the first character of the target instruction text, then detection of whether the subsequently generated character hits the remaining characters of the target instruction text, and if the subsequently generated character hits the remaining characters of the target instruction text, then detection of the instruction text from the recognition text is indicated.
For example, if the preset instruction text includes "open", "review", "contrast" and "compare". In the detection process of the recognition text, the first character in the recognition text, which is currently generated, is detected to be're-looking', and the first character in the're-looking' and're-looking down', then the subsequently generated character is further detected, and if the subsequently generated character is detected to be 'looking', the remaining characters of the text instruction, which is the're-looking' are hit, are detected, and the instruction text, which is the're-looking' is detected from the recognition text, is indicated.
S202, if the identification text comprises a preset instruction text, extracting a target text segment with a set length from a set position in the identification text according to the instruction text in the identification text.
In the embodiment of the application, if the identification text is determined to comprise the preset instruction text, the target text segment possibly containing the name of the display object can be extracted according to the position of the display object indicated by the preset instruction text. For example, a target text segment that may contain the name of the presentation object is extracted from the position of the instruction text in the recognition text, word by word, forward or backward, according to the position of the presentation object indicated by the instruction text set in advance. For the character length of the extracted target text segment, those skilled in the art may set in advance according to the actual application scenario, which is not limited in this embodiment.
When the target text segment is extracted, the instruction text may be extracted together with or without being included in the target text segment, and the embodiment is not limited thereto. For example, if the target text segment includes instruction text, the target text segment and the instruction text may be expressed as "x$ { }" or "$ { }" X "is instruction text, and" $ { } "is the target text segment. "X$ { }" indicates that the target text segment precedes the text instruction; "$ { } X" indicates that the target text segment follows the text instruction.
For example, in a scene that a video anchor performs on-line product sales, the position relationship between a target text segment and an instruction text is set as follows: "open $ { }" "review $ { }" "versus next $ { }" we review "" "next to be introduced $ { }). Where "open", "review", "compare", "next to be introduced" indicates that the target text segment follows the text instruction and "we review" indicates that the target text segment precedes the text instruction.
If the extracted identification text is "the shampoo of the type A introduces the next very good type B shampoo to be introduced and the component Q is added", the instruction text can be determined to be "the next very good type B shampoo to be introduced", the text instruction indicates that the target text segment is after the text instruction, the target text segment possibly containing the display object name is determined by extracting the target text segment from the position of the instruction text in the identification text word by word after the text instruction, if the length of the preset target text segment is 15 characters, the character "the next very good type B shampoo to be introduced" is extracted, the maximum length of the extracted character is 15, and the shampoo of the very good type B shampoo with the target text segment is obtained.
S203, detecting whether the display object name is contained in the target text segment.
In this embodiment, after the target text segment is successfully extracted from the recognition text. And further carrying out matching identification on the target text segment to determine whether the target text segment comprises the name of the display object.
Illustratively, the feature mapping library of the above embodiment includes images, and matching relations of the images and the names of the presentation objects contained in the images. Further, in the embodiment of the present application, the name of the display object may also be stored in the feature mapping library. When determining whether the target text segment includes the display object name, characters of the target text segment can be detected one by one so as to determine whether the target text segment includes the display object name in the feature mapping library.
If the target text segment comprises the display object name in the feature mapping library, the display object name is contained in the target text segment.
S204, if the target text segment contains the display object name, determining the instruction text and the display object name as an image display instruction.
In the embodiment of the application, if the display object name is contained in the target text segment, the instruction text and the display object name are determined to be the image display instruction.
In the above embodiment, the recognition text is firstly identified word by word, whether the instruction text is identified is determined, if the instruction text is identified, the target text segment which possibly includes the display object in the recognition text is determined according to the directivity of the instruction text, then the recognition is performed from the target text segment, whether the display object name is included in the target text segment is detected, and then the real display object name is identified, so that the situation of false recognition is avoided.
As shown in fig. 3, another embodiment of the present application discloses that the feature mapping library of the above embodiment may be specifically constructed in the following manner:
S301, identifying text content of the image.
In introducing a theme to a service-oriented object, for convenience of explanation of the theme, people will generally explain the theme by means of voice explanation in combination with images. The image contains the corresponding theme name, the display picture and other contents. In the embodiment of the application, the text content of the images can be identified, and then the names of the display objects are extracted from the text content, so that the display objects corresponding to each image are determined.
For example, if the text content in the image is in a text format, for example, txt format, word format, etc., the corresponding text content may be directly read. If the text content in the image is in a non-text format, such as a picture format and pdf format, the text content can be identified by a text identification technology, so as to obtain corresponding text content.
S302, detecting whether the text content of the image comprises a display object name.
In the embodiment of the application, after the text content is obtained, characters in the text content are identified, and whether the text content comprises the name of the display object is further determined.
For example, a display object name database including all display object names may be preset, where the display object name database includes display object names, where the display object names may be manually added to the display object name database by a user, text content may be captured from a network, semantic recognition may be performed on the text content, and then the display object names may be extracted and stored in the display object name database.
Characters in the text content can be detected one by one to determine whether the text content comprises the display object names in the display object name database, and if the text content comprises the display object names in the display object name database, the text content is indicated to comprise the display object names.
It should be noted that, when introducing a theme towards a service object, there is a case where a plurality of themes are displayed using the same image, so that a plurality of display object names may exist in the same image, that is, a plurality of display object names may be extracted from text content of one image, which is not limited in this embodiment.
And S303, if the text content of the image comprises the display object name, storing the image and the matching relation between the image and the display object name contained in the image in a feature mapping library.
Specifically, if the text content of the image includes the display object name, the matching relationship between the image and the display object name included in the image may be stored to obtain the feature mapping library.
Illustratively, in a scenario where the video anchor is conducting an online product sale, the image is a product poster. Through text conversion and recognition of the text content of the product poster, the product poster is determined to comprise three display object names of an A1 type mobile phone, an A2 type mobile phone and an A3 type mobile phone. Then the product poster, the matching relationship of the A1 type mobile phone and the product poster, the matching relationship of the A2 type mobile phone and the product poster, and the matching relationship of the A3 type mobile phone and the product poster can be stored in the feature mapping library.
In the above embodiment, the display object names can be automatically extracted from the images, and the feature mapping library including the images and the matching relationship between the images and the display object names included in the images is established. Compared with the mode of manually extracting the name of the display object from the image, the extraction speed and the extraction accuracy are effectively improved.
As shown in fig. 4, in another embodiment of the present application, the step of detecting whether the text content of the image includes the display object name in the above embodiment may be specifically implemented as follows:
S401, starting from the ith character of the text content of the image, judging whether a character string formed by the first N characters of the target character sequence is a display object name or not in sequence.
The embodiment of the application defines two variables, i and N, where i and N are both positive integers and the start values are both 1. The target character sequence is composed of all characters starting from the ith character in the text content of the image, and the character string is composed of N characters starting from the first character of the target character sequence.
Illustratively, when i and N are both equal to 1, the target character sequence includes all characters of the text content starting from the first character, and the character string of the first N characters includes only the first character of the target character sequence. When i is equal to 1 and N is equal to 4, the target character sequence includes all characters of the text content starting from the first character, and the character string composed of the first N characters includes the first 4 characters of the target character sequence. When i is equal to 3 and N is equal to 4, the target character sequence includes all characters of the text content starting from the 3 rd character, and the character string composed of the first N characters includes the first 4 characters of the target character sequence.
For example, in a scenario where a video anchor is conducting an online product sale, the image is a product poster. The text content obtained through the text recognition is that the XY mobile phone is a light and thin mobile phone. When i and N are equal to 1, the target character sequence comprises 'XY mobile phone is a light and thin mobile phone', and the character string formed by the first N characters comprises 'X'; when i is equal to 1 and N is equal to 4, the target character sequence comprises an XY mobile phone which is a light and thin mobile phone, and the character string formed by the first N characters comprises the XY mobile phone; when i is equal to 3 and N is equal to 4, the target character sequence comprises "the mobile phone is a light and thin mobile phone", and the character string formed by the first N characters comprises "the mobile phone is one".
In the embodiment of the application, starting with the value that i and N are equal to 1, starting from the ith character of the text content of the image, sequentially judging the character string formed by the first N characters of the target character sequence, detecting, and determining whether the character string comprises the display object name.
S402, if the character string formed by the first N characters of the target character sequence is a display object name, determining that the text content of the image contains the display object name and recording the display object name, forming a new target character sequence by the remaining characters starting from the (n+1) th character of the target character sequence, and repeating the process until the length of the formed new target character sequence is zero or N is equal to a preset value.
If the character string formed by the first N characters of the target character sequence contains the display object name from the ith character of the text content of the image, the display object name can be recorded so as to establish the matching relationship between the display object name and the picture, the matching relationship between the display object name and the picture are stored in the feature mapping library.
Further, if the character string composed of the first N characters of the target character sequence includes the display object name from the ith character of the text content of the image, the remaining characters from the n+1th character of the target character sequence may be further composed into a new target character sequence. And then detecting character strings formed by the first N characters in the new target character sequence, and determining whether the character strings formed by the first N characters in the new target character sequence contain the display object names or not, so that the operation is repeated until N is equal to the total character number of the target character sequence or N is equal to a preset value.
The above N is equal to the total character number representation of the target character sequence: and detecting the current target character sequence, judging whether the character string formed by the first N characters of the target character sequence is the display object name, sequentially increasing N from 1 until N is the same as the total characters of the target character sequence, and determining the target character sequence again for detection if the N is still not matched with the display object name.
Further, the character length of the display object name is limited, for example, in a scenario where the video anchor is selling on-line products, the display object is a product to be sold, and the character length of the product name is generally within a limited number of characters. When N is equal to the upper limit value of the character length, the display object name is still not detected in the current target character sequence, and then the value of N is continuously increased, so that the display object name is still not detected, and redundancy is caused. Therefore, a preset value may be preset as the character length upper limit value of the display object name, and when N is equal to the total number of characters of the target character sequence or N is equal to the preset value, the target character sequence needs to be redetermined for detection.
Illustratively, in a scenario where the video anchor is conducting an online product sale, the image is a product poster. The text content obtained through the text recognition is that the XY mobile phone is a light and thin mobile phone. The AB mobile phone has strong endurance. Note that, in this example, punctuation marks are used as one character.
The detection is started with i and N equal to 1 as the number. And when both i and N are equal to 1, the target character sequence comprises that the XY mobile phone is a light and thin mobile phone. The AB mobile phone has strong cruising ability, and a character string formed by the first N characters comprises 'X', wherein the 'X' is not the name of the display object; adding 1 to the value of N, wherein the current N is equal to 2, and determining that a character string consisting of the first N characters comprises XY which is not the name of the display object; adding 1 to the numerical value of N, wherein the current N is equal to 3, and determining that a character string consisting of the first N characters comprises an XY hand which is not the name of the display object from the target character sequence; the numerical value of N is added with 1, the current N is equal to 4, the character string formed by the first N characters is determined to comprise an XY mobile phone which is a display object name from the target character sequence, the display object name is recorded, and the remaining characters starting from the (N+1) th character of the target character sequence are formed into a new target character sequence, namely the XY mobile phone is a light and thin mobile phone. The AB mobile phone has very strong cruising ability, and the content of the fifth character is a light and thin mobile phone. The AB mobile phone has very strong cruising ability and forms a new target character sequence. Then, whether the character string composed of the first N characters is the name of the display object is detected from the new target character sequence.
Further, the new target character sequence is 'a thin and light mobile phone'. The AB mobile phone has strong cruising ability, if the upper limit value of the character length of the name of the display object is preset to be 30:
When N is equal to 1, the character string formed by the first N characters is 'yes', and 'yes' is not the name of the display object; adding 1 to the value of N, wherein the current N is equal to 2, and determining that the character string consisting of the first N characters in the target character sequence comprises 'yes' and 'yes' or not to display the object name; the numerical value of N is sequentially increased until N is the same with the total characters of the target character sequence, and the character string formed by the first N characters is a thin and light mobile phone. AB cell-phone duration is very strong "," is a frivolous cell-phone of a section. The AB mobile phone has very strong endurance capability, is not the name of the display object, and N is the same as the total characters of the target character sequence, and still does not match the name of the display object, and the target character sequence needs to be determined again for detection.
If the upper limit value of the character length of the name of the display object is preset to be 5:
when N is equal to 5, the character string composed of the first N characters is "thin and light", and "not the name of the display object", and the target character sequence needs to be determined again for detection.
After a new target character sequence is formed, the value of the corresponding i also changes, and i=i+n corresponding to the new target character sequence. For example, in the above embodiment, the target character sequence is defined as "XY mobile phone is a lightweight and thin mobile phone. The AB mobile phone has very strong cruising ability and is a light and thin mobile phone. The AB mobile phone has strong endurance, and the corresponding value of i is updated from 1 to 5.
S403, let i=i+1, and re-execute the above processing until i+1 is greater than the total number of characters of the text content in the image.
In the embodiment of the application, when N is equal to the total number of characters of the target character sequence or N is equal to a preset value, the target character sequence needs to be redetermined for detection. The target character sequence is redetermined in such a way that i=i+1. That is, the first character is the second character of the previous target sequence relative to the new target character sequence.
After the new target character sequence is formed, the processing procedure can be re-executed until i+1 is greater than the total number of characters of the text content in the image, that is, i is equal to the total number of characters of the text content, and the identification of the text content is completed.
In the above embodiment, i=5, and the corresponding target character sequence is "a thin and light mobile phone". The AB mobile phone has strong endurance, and when N is equal to the upper limit value 5 of the character length of the display object name, the character string formed by the first N characters is "light and thin", and "not the display object name", and the target character sequence needs to be determined again for detection. Let i=i+1, current i=6, the new target character sequence is "a thin and light mobile phone". The AB mobile phone has strong endurance. The processing procedure is re-executed, the detection is continued, and when N is equal to the upper limit value 5 of the character length of the display object name, the character string formed by the first N characters is 'a thin and light character string', 'a thin and light character string' is not the display object name; let i=i+1, current i=7, the new target character sequence is "frivolous mobile phone". The AB mobile phone has strong endurance capability, when N is equal to the upper limit value 5 of the character length of the display object name, the character string formed by the first N characters is a light and thin hand, and the light and thin hand is not the display object name; and continuing to add 1 to the value of i to obtain a new i.
When i=13 and n=4, the new target character sequence is "AB mobile phone with strong cruising ability", the character string composed of the first N characters is "AB mobile phone" and the AB mobile phone "is the name of the display object, at this time, the remaining characters starting from the n+1th character of the target character sequence are composed into the new target character sequence with strong cruising ability, and the value of i is updated to 17. Detecting a character string consisting of the first N characters of a target character sequence with strong endurance, wherein when N is equal to the upper limit value 5 of the character length of the display object name, the character string consisting of the first N characters is the display object name, the endurance is very high, the endurance is very low, i=i+1, the current 18 is the new target character sequence with strong endurance, and 1 is continuously added to the value of i to obtain a new i.
When i+1=24, the value of i+1 is greater than the total number of characters 23 of the text content, no character is in the target character sequence, and the identification of the text content is completed.
It should be noted that the text in the image may include a plurality of lines, so that the text content corresponding to a line of text may be identified as a set of text content according to the steps of the above embodiment, and after the text content of one set is identified, the text content of the next set is identified until all the text content is identified. The text content corresponding to all the characters may be identified as the same group of text content, which is not limited in this embodiment.
In the embodiment of the application, whether the text content of the image comprises the display object name can be automatically identified, and compared with the mode of manually extracting the display object name from the image, the extraction speed and the extraction accuracy are effectively improved.
For ease of understanding, i and N may be described by character boundary indices. If the text is arranged in the left-to-right order, i can be used as a left boundary index, N can be used as a right boundary index, the initial positions of the left boundary index i and the right boundary index N are the first characters corresponding to the text content, the characters between i and N are the first characters corresponding to the text content, whether the characters hit the display object name is detected, if the characters miss the display object name, the right boundary index N is moved rightwards, the number of bits of the right boundary index N is one, at the moment, the characters between i and N are two, whether the characters hit the display object name is detected, if the characters miss the display object name, the right boundary index N is continuously moved rightwards, and when the N is moved to a certain number, the characters between i and N hit the display object name, and the display object name is recorded; and simultaneously, moving the right boundary index N to the right by one bit, moving the left boundary index i to the current position of the right boundary index N, then continuously detecting characters between i and N, if the characters do not hit the display object name, moving the right boundary index N to the right, and reciprocating in this way until the right boundary index N is positioned at the tail end of the text content or the value of the right boundary index N is equal to a preset value, enabling the left boundary index i to move to the right from the current position by one bit, moving the right boundary index N to the current position of the left boundary index i, continuously detecting the characters between i and N, and repeatedly executing the processing until the left boundary index i moves to the character position at the tail end of the text content, and completing the identification of the text content.
For example, the text content is "XY mobile phone is a lightweight and thin mobile phone. The AB mobile phone has strong endurance. Initially, the right boundary index N and the left boundary index i are both located at the character "X", the "X" misses the display object name, the right boundary index N moves rightward, at this time, the character between i and N is "XY", the XY "misses the display object name, the right boundary index N continues to move rightward until the value of the right boundary index N is equal to 4, the right boundary index N is located at the character" machine ", at this time, the character between i and N is" XY cell phone ", and the" XY cell phone "hits the display object name.
At this time, the right boundary index N is moved to the character "yes", and the left boundary index i is also moved to the character "yes". The right boundary index N moves rightwards when the right boundary index N moves to the character 'thin', and the position of the left boundary index i needs to be updated when the right boundary index N moves to the preset value 5. The left boundary index i moves from the current position 'yes' to the character 'one', the right boundary index N also moves to the character 'one', the characters between i and N are continuously detected, the processing is repeatedly executed until the left boundary index i moves to the character 'strong', and the XY mobile phone which aims at text content is a light and thin mobile phone. The identification of the AB mobile phone with very strong endurance is completed.
As shown in fig. 5, in another embodiment of the present application, the steps in the above embodiment sequentially determine whether a character string composed of the first N characters of the target character sequence is a display object name, which may be specifically implemented by:
s501, detecting whether a preset display object name database comprises display object names matched with character strings formed by the first N characters of a target character sequence.
In this embodiment, a display object name database is preset. The display object name database contains display object names, wherein the display object names can be manually added into the display object name database by a user, text contents can be captured from a network, semantic recognition is carried out on the text contents, the display object names are further extracted, and the display object names are stored into the display object name database.
In an exemplary embodiment, in a scenario where a video anchor performs online product sales, the display object name is a product name, text content may be captured from a display interface of a shopping platform, the product name may be further extracted by performing semantic recognition on the text content, and the extracted product name may be stored in a display object name database.
Whether the character string composed of the first N characters of the target character sequence hits the display object name is detected, that is, whether the preset display object name database includes the display object name matched with the character string composed of the first N characters of the target character sequence is detected.
The method can be used for detecting the character string formed by the first N characters of the target character sequence and the display object names in the display object name database in a character-by-character matching mode, and determining whether the preset display object names comprise display object names matched with the character string formed by the first N characters of the target character sequence or not.
S502, if the display object name database comprises the display object names matched with the character strings, determining the character strings as the display object names.
And if the display object names matched with the character strings are determined to be included in the display object name database through detection, the character strings are indicated to be display object names. So as to establish a matching relationship between the display object name and the corresponding image, and store the matching relationship among the image, the display object name and the corresponding image in a feature mapping library.
According to the embodiment of the application, whether the text content of the image comprises the display object name can be automatically identified, so that the identification speed and accuracy of the display object name are improved.
As shown in fig. 6, it is disclosed in another embodiment of the present application that the step of detecting whether the text content of the image includes the display object name in the text content of the image may be specifically performed by:
S601, searching whether the text content comprises the preset display object names in the display object name database or not according to the preset display object names in the display object name database as search conditions.
The construction method of the display object name database is the same as that of the above embodiment, and a person skilled in the art may refer to the description of the above embodiment, and details thereof are not repeated here.
In the embodiment of the application, the display object names in the display object name database are used as search conditions, and the text content is searched word by word to determine whether the text content comprises the preset display object names in the display object name database.
S602, if the text content comprises a preset display object name in a display object name database, determining that the text content of the image comprises the display object name, and recording the display object name.
If the text content is detected to include the display object name in the preset display object name database, the text content representing the image includes the display object name, and the display object name is recorded. So as to establish a matching relationship between the display object name and the corresponding image, and store the matching relationship among the image, the display object name and the corresponding image in a feature mapping library.
According to the embodiment of the application, whether the text content of the image comprises the display object name can be automatically identified, so that the identification speed and accuracy of the display object name are improved.
In another embodiment of the present application, it is disclosed that the step of displaying the target image in the above embodiment may be specifically implemented by:
and switching the display content of the set image display area to the target image.
In particular, when the theme introduction is performed on the service-oriented object, an image display area is provided so as to display an image through the image display area, so that the theme is described through image assistance. In the embodiment of the application, after the target image is determined, the display content of the image display area is switched to the target image.
For example, if the theme introduction of the service-oriented object is performed by a live video manner, a display area of the live video may be used as an image display area. If the theme introduction of the service-oriented object is performed on line, for convenience of description of the theme, a description will be made with reference to an image displayed by the display device, and in such a scenario, a display area of the display device may be regarded as an image display area.
According to the embodiment of the application, the target image can be automatically displayed, and the convenience of image switching is improved.
Corresponding to the above scene display method, the embodiment of the application also discloses a scene display device, as shown in fig. 7, which comprises:
a detection module 100 for detecting an image display instruction from a target voice; the image display instruction is used for triggering display of the image matched with the specific display object;
The determining module 110 is configured to determine, if an image display instruction is detected from the target voice, a target image corresponding to the image display instruction from a feature mapping library according to the detected image display instruction and a preset feature mapping library; the feature mapping library stores images and matching relations between the images and the display objects;
And the display module 120 is used for displaying the target image.
In the scene display device, a detection module 100 detects an image display instruction from target voice; the image display instruction is used for triggering display of the image matched with the specific display object; if an image display instruction is detected from the target voice, the determining module 110 determines a target image corresponding to the image display instruction according to the detected image display instruction; the display module 120 displays the target image. According to the scheme, automatic switching of the display images based on voice is realized, and convenience of image switching is improved. The video live broadcasting based product sales method and device can be applied to product sales based on video live broadcasting, automatic switching of product posters can be achieved, manual switching by workers is not needed, and convenience of poster switching is improved.
Optionally, in another embodiment of the present application, the determining module 110 of the above embodiment includes:
The first determining unit is used for determining a target image corresponding to the image display instruction from the feature mapping library according to the detected image display instruction and the preset feature mapping library; the feature mapping library stores images and matching relations between the images and the display objects.
Optionally, in another embodiment of the present application, the detection module 100 of the above embodiment includes:
the recognition unit is used for carrying out voice recognition on the target voice to obtain a recognition text;
and the detection unit is used for detecting the image display instruction from the identification text.
Optionally, in another embodiment of the present application, the detecting unit of the above embodiment includes:
The first detection subunit is used for detecting whether the identification text comprises a preset instruction text or not;
The extraction subunit is used for extracting a target text segment with a set length from a set position in the identification text according to the instruction text in the identification text if the identification text comprises the preset instruction text; wherein the set position is determined according to the position of the instruction text;
The second detection subunit is used for detecting whether the target text segment contains a display object name or not;
And the first determination subunit is used for determining the instruction text and the display object name as the image display instruction if the target text segment contains the display object name.
Optionally, in another embodiment of the present application, the scene showing device of the above embodiment further includes:
The identification module is used for identifying the text content of the image;
The name detection module is used for detecting whether the text content of the image comprises a display object name or not;
and the storage module is used for storing the image and the matching relation between the image and the display object name contained in the image in the feature mapping library if the display object name is contained in the text content of the image.
Optionally, in another embodiment of the present application, the name detection module of the above embodiment includes:
the judging unit is used for sequentially judging whether a character string formed by the first N characters of the target character sequence is a display object name from the ith character of the text content of the image, wherein i and N are positive integers; the target character sequence is composed of all characters starting from the ith character of the text content of the image;
A second determining unit, configured to determine that the text content of the image includes the display object name and record the display object name if the character string composed of the first N characters of the target character sequence is the display object name, and compose the remaining characters starting from the n+1th character of the target character sequence into a new target character sequence;
A first repeating unit for repeating the above-described process until N is equal to the total number of characters of the target character sequence or N is equal to a preset value;
and a second repeating unit for letting i=i+1 and re-executing the above-described processing until i+1 is greater than the total number of characters of the text content in the image.
Optionally, in another embodiment of the present application, the determining unit of the above embodiment includes:
a third detection subunit, configured to detect whether a preset display object name database includes a display object name that matches a character string formed by the first N characters of the target character sequence;
And the second determining subunit is used for determining the character string as the display object name if the display object name database comprises the display object name matched with the character string.
Optionally, in another embodiment of the present application, the name detection module of the above embodiment includes:
The searching unit is used for searching whether the text content comprises the preset display object names in the display object name database or not according to the preset display object names in the display object name database as searching conditions;
And the third determining unit is used for determining that the text content of the image comprises the display object name if the text content comprises the display object name in the preset display object name database, and recording the display object name.
Specifically, for the specific working content of each unit of the above-mentioned scene display device, please refer to the content of the above-mentioned method embodiment, and details are not repeated here.
Another embodiment of the present application further provides a screen display device, as shown in fig. 8, where the screen display device of this embodiment includes a speech recognition unit 200, a first processor 210, and a display screen 220, and the first processor 210 is connected to the speech recognition unit 200 and the display screen 220, respectively. The first processor 210 is illustratively coupled to the speech recognition component 200 and the display screen 220, respectively, via a bus.
The voice recognition unit 200 is configured to perform voice recognition on the collected user voice to obtain a recognition text. The screen display device further includes a voice input device for capturing user voice and transmitting the captured user voice to the voice recognition unit 200 for recognition, for example. The voice input device includes a microphone and the like.
A first processor 210 for detecting an image presentation instruction from the recognition text output from the speech recognition unit 200, wherein the image presentation instruction is used for triggering the presentation of an image matched with a specific presentation object; if an image presentation instruction is detected from the target voice, a target image corresponding to the image presentation instruction is determined according to the detected image presentation instruction, and the target image is transmitted to the display screen 220.
The first processor 210 may be a general-purpose processor, such as a general-purpose Central Processing Unit (CPU), microprocessor, etc., or may be an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present application. But may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
And a display screen 220 for decoding and displaying the target image transmitted 210 by the first processor.
The screen display equipment of the embodiment realizes automatic switching of the display image based on voice and improves the convenience of image switching. The screen display equipment is applied to product sales based on live video broadcasting, can automatically switch product posters, does not need manual switching by staff, and improves the convenience of poster switching.
Alternatively, in another embodiment of the present application, the screen display device of the above embodiment further includes a first memory 230 and an optical character recognition unit 240, and the first memory 230 and the optical character recognition unit 240 are respectively connected to the first processor 210 of the above embodiment. Illustratively, the first memory 230 and the optical character recognition unit 240 are respectively connected to the first processor 210 of the above embodiment through buses.
The optical character recognition part 240 serves to recognize text contents from an inputted image and transmit the recognized text contents to the first processor 210.
The first processor 210 is further configured to detect whether the text content obtained by the recognition of the optical character recognition unit includes a display object name; if the text content recognized by the optical character recognition unit includes a presentation object name, the input image and the matching relationship between the image and the presentation object name included in the image are stored in the first processor 210.
The first memory 230 stores the image transmitted by the first processor 210 and the matching relationship of the image and the presentation object name contained in the image.
The first memory 230 may include a read-only memory (ROM), other types of static storage devices that may store static information and instructions, a random access memory (random access memory, RAM), other types of dynamic storage devices that may store information and instructions, a disk memory, a flash, and so forth.
As an alternative embodiment, the first processor 210 detects an image presentation instruction from the recognition text, including:
Detecting whether the identification text comprises a preset instruction text or not;
If the identification text comprises a preset instruction text, extracting a target text segment with a set length from a set position in the identification text according to the instruction text in the identification text; wherein the set position is determined according to the position of the instruction text;
detecting whether the target text segment contains a display object name or not;
And if the target text segment contains the display object name, determining the instruction text and the display object name as an image display instruction.
As an alternative embodiment, the first processor 210 detects whether the text content of the image includes the name of the presentation object, including:
Starting from the ith character of the text content of the image, sequentially judging whether a character string formed by the first N characters of the target character sequence is a display object name or not, wherein i and N are positive integers; the target character sequence is composed of all characters starting from the ith character of the text content of the image;
If the character string formed by the first N characters of the target character sequence is a display object name, determining that the text content of the image contains the display object name, recording the display object name, and forming the remaining characters starting from the (n+1) th character of the target character sequence into a new target character sequence;
repeating the above process until N is equal to the total character number of the target character sequence or N is equal to a preset value;
let i=i+1 and re-execute the above processing until i+1 is greater than the total number of characters of the text content in the image.
As an alternative embodiment, the first processor 210 sequentially determines whether the character string composed of the first N characters of the target character sequence is the name of the display object, including:
Detecting whether a preset display object name database comprises display object names matched with character strings formed by the first N characters of a target character sequence or not;
and if the display object name database comprises the display object names matched with the character strings, determining the character strings as the display object names.
As an alternative embodiment, the first processor 210 detects whether the text content of the image includes the name of the presentation object, including:
Searching whether the text content comprises the preset display object names in the display object name database or not according to the preset display object names in the display object name database as search conditions;
If the text content comprises the preset display object names in the display object name database, determining that the text content of the image comprises the display object names, and recording the display object names.
As an alternative embodiment, the display screen 220 presents a target image, including:
and switching the display content of the set image display area to the target image.
The screen display device provided in this embodiment belongs to the same application conception as the scene display method provided in the above embodiment of the present application, and may execute the scene display method provided in any of the above embodiments of the present application, and has the corresponding functional modules and beneficial effects of executing the above scene display method. Technical details not described in detail in the present embodiment may refer to specific processing content of the scene showing method provided in the foregoing embodiment of the present application, and will not be described herein again.
Another embodiment of the present application also proposes an electronic device, as shown in fig. 10, including:
A second memory 300 and a second processor 310;
Wherein the second memory 300 is connected to the second processor 310, and is used for storing a program;
The second processor 310 is configured to implement the scene showing method disclosed in any one of the foregoing embodiments by running a program stored in the second memory 300.
Specifically, the electronic device may further include: a bus, a second communication interface 320, a second input device 330 and a second output device 340.
The second processor 310, the second memory 300, the second communication interface 320, the second input device 330, and the second output device 340 are connected to each other by a bus. Wherein:
a bus may comprise a path that communicates information between components of a computer system.
The second processor 310 may be a general-purpose processor, such as a general-purpose Central Processing Unit (CPU), microprocessor, etc., or may be an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present application. But may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The second processor 310 may include a main processor, and may also include a baseband chip, a modem, and the like.
The second memory 300 stores programs for implementing the technical scheme of the present application, and may also store an operating system and other key services. In particular, the program may include program code including computer-operating instructions. More specifically, the second memory 300 may include a read-only memory (ROM), other types of static storage devices that may store static information and instructions, a random access memory (random access memory, RAM), other types of dynamic storage devices that may store information and instructions, a disk memory, a flash, and so forth.
The second input device 330 may include means for receiving data and information entered by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input means, touch screen, pedometer or gravity sensor, etc.
The second output device 340 may include means, such as a display screen, a printer, speakers, etc., that allow information to be output to a user.
The second communication interface 320 may include devices using any transceiver or the like for communicating with other devices or communication networks, such as ethernet, radio Access Network (RAN), wireless Local Area Network (WLAN), etc.
The second processor 310 executes the program stored in the second memory 300 and invokes other devices, which can be used to implement the steps of the scene showing method provided in the above embodiment of the present application.
The electronic device may specifically be a display screen, a screen display device, a display controller, or a display system having a display screen or an image display function.
Another embodiment of the present application further provides a storage medium having a computer program stored thereon, where the computer program, when executed by a processor, implements the steps of the scene showing method provided in any of the foregoing embodiments.
Specifically, the specific working content of each part of the above electronic device and the specific processing content of the computer program on the storage medium when executed by the processor may refer to the content of each embodiment of the above scene showing method, which is not described herein again.
For the foregoing method embodiments, for simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders or concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
The steps in the method of each embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs, and the technical features described in each embodiment can be replaced or combined.
The modules and the submodules in the device and the terminal of the embodiments of the application can be combined, divided and deleted according to actual needs.
In the embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of modules or sub-modules is merely a logical function division, and there may be other manners of division in actual implementation, for example, multiple sub-modules or modules may be combined or integrated into another module, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules or sub-modules illustrated as separate components may or may not be physically separate, and components that are modules or sub-modules may or may not be physical modules or sub-modules, i.e., may be located in one place, or may be distributed over multiple network modules or sub-modules. Some or all of the modules or sub-modules may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional module or sub-module in the embodiments of the present application may be integrated in one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated in one module. The integrated modules or sub-modules may be implemented in hardware or in software functional modules or sub-modules.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software elements may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (11)
1. A scene showing method, comprising:
Detecting whether the identification text comprises a preset instruction text or not; if the identification text comprises a preset instruction text, extracting a target text segment with a set length from a set position in the identification text according to the instruction text in the identification text; the set position is determined according to the position of the instruction text, and the recognition text is obtained after voice recognition of the target voice;
detecting whether the target text segment contains a display object name or not; if the target text segment contains a display object name, determining the instruction text and the display object name as an image display instruction; the image display instruction is used for triggering display of an image matched with a specific display object;
If an image display instruction is detected from the target voice, determining a target image corresponding to the image display instruction from a feature mapping library according to the detected image display instruction and the preset feature mapping library; the feature mapping library stores images and matching relations between the images and display objects;
and displaying the target image.
2. The method of claim 1, wherein the feature map library is constructed by:
Identifying text content of the image;
Detecting whether the text content of the image comprises a display object name or not;
And if the text content of the image comprises the display object name, storing the image and the matching relation between the image and the display object name contained in the image in the feature mapping library.
3. The method of claim 2, wherein the detecting whether the text content of the image includes a presentation object name comprises:
Starting from the ith character of the text content of the image, sequentially judging whether a character string formed by the first N characters of the target character sequence is a display object name or not, wherein i and N are positive integers; the target character sequence is composed of all characters starting from the ith character of the text content of the image;
If the character string formed by the first N characters of the target character sequence is a display object name, determining that the text content of the image contains the display object name, recording the display object name, and forming the rest characters starting from the (n+1) th character of the target character sequence into a new target character sequence;
repeating the above process until N is equal to the total character number of the target character sequence or N is equal to a preset value;
let i=i+1 and re-execute the above processing until i+1 is greater than the total number of characters of the text content in the image.
4. A method according to claim 3, wherein said sequentially determining whether the character string consisting of the first N characters of the target character sequence is a presentation object name comprises:
Detecting whether a preset display object name database comprises display object names matched with character strings formed by the first N characters of a target character sequence or not;
And if the display object name database comprises the display object name matched with the character string, determining the character string as the display object name.
5. The method of claim 2, wherein the detecting whether the text content of the image includes a presentation object name comprises:
searching whether the text content comprises the display object names in the preset display object name database or not by taking the display object names in the preset display object name database as search conditions;
if the text content comprises the display object names in the preset display object name database, determining that the text content of the image comprises the display object names, and recording the display object names.
6. The method of claim 1, wherein the presenting the target image comprises:
And switching the display content of the set image display area to the target image.
7. A scene showing device, comprising:
The detection module is used for detecting whether the identification text comprises a preset instruction text or not; if the identification text comprises a preset instruction text, extracting a target text segment with a set length from a set position in the identification text according to the instruction text in the identification text; the set position is determined according to the position of the instruction text, and the recognition text is obtained after voice recognition of the target voice; detecting whether the target text segment contains a display object name or not; if the target text segment contains a display object name, determining the instruction text and the display object name as an image display instruction; the image display instruction is used for triggering display of an image matched with a specific display object;
The determining module is used for determining a target image corresponding to the image display instruction from the feature mapping library according to the detected image display instruction and the preset feature mapping library if the image display instruction is detected from the target voice; the feature mapping library stores images and matching relations between the images and display objects;
And the display module displays the target image.
8. A screen display device, comprising:
The device comprises a voice recognition component, a first processor connected with the voice recognition component, and a display screen connected with the first processor;
The voice recognition component is used for carrying out voice recognition on the collected user voice to obtain a recognition text;
The first processor is used for detecting preset instruction texts from the recognition texts output by the voice recognition component; if the identification text comprises a preset instruction text, extracting a target text segment with a set length from a set position in the identification text according to the instruction text in the identification text; wherein the set position is determined according to the position of the instruction text; detecting whether the target text segment contains a display object name or not; if the target text segment contains a display object name, determining the instruction text and the display object name as an image display instruction; the image display instruction is used for triggering display of an image matched with a specific display object; if an image display instruction is detected from the target voice, determining a target image corresponding to the image display instruction from a feature mapping library according to the detected image display instruction and a preset feature mapping library, and sending the target image to the display screen; the feature mapping library stores images and matching relations between the images and display objects;
and the display screen decodes and displays the target image sent by the first processor.
9. The on-screen device of claim 8, further comprising:
An optical character recognition unit and a first memory connected to the first processor, respectively;
the optical character recognition part is used for recognizing text content from an input image and sending the recognized text content to the first processor;
The first processor is further configured to: detecting whether the text content obtained by the recognition of the optical character recognition part comprises a display object name or not; if the text content obtained by the identification of the optical character identification component comprises a display object name, storing an input image and a matching relation between the image and the display object name contained in the image into the first memory;
the first memory stores the image sent by the first processor and the matching relation between the image and the name of the display object contained in the image.
10. An electronic device, comprising:
A second memory and a second processor;
Wherein the second memory is used for storing programs;
The second processor is configured to implement the scene showing method according to any one of claims 1 to 6 by running the program in the second memory.
11. A storage medium, comprising: the storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the scene showing method according to any of claims 1 to 6.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210682504.0A CN115243062B (en) | 2022-06-16 | 2022-06-16 | Scene display method and device, screen display device, electronic device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210682504.0A CN115243062B (en) | 2022-06-16 | 2022-06-16 | Scene display method and device, screen display device, electronic device and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN115243062A CN115243062A (en) | 2022-10-25 |
| CN115243062B true CN115243062B (en) | 2024-06-07 |
Family
ID=83670387
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210682504.0A Active CN115243062B (en) | 2022-06-16 | 2022-06-16 | Scene display method and device, screen display device, electronic device and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115243062B (en) |
Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108305626A (en) * | 2018-01-31 | 2018-07-20 | 百度在线网络技术(北京)有限公司 | The sound control method and device of application program |
| CN108920580A (en) * | 2018-06-25 | 2018-11-30 | 腾讯科技(深圳)有限公司 | Image matching method, device, storage medium and terminal |
| JP2019057057A (en) * | 2017-09-20 | 2019-04-11 | 富士ゼロックス株式会社 | Information processing apparatus, information processing system, and program |
| CN110471599A (en) * | 2019-08-14 | 2019-11-19 | 广东小天才科技有限公司 | Screen word-selecting searching method, device, electronic equipment and storage medium |
| CN110881134A (en) * | 2019-11-01 | 2020-03-13 | 北京达佳互联信息技术有限公司 | Data processing method and device, electronic equipment and storage medium |
| TW202013981A (en) * | 2018-09-26 | 2020-04-01 | 英屬維爾京群島商創意點子數位股份有限公司(Bvi) | Multimedia push notification method and interaction device thereof capable of enhancing interactive effect during push notifications, simplifying setting procedure for push notifications, and maintaining smoothness and comfort during watching video |
| CN111131889A (en) * | 2019-12-31 | 2020-05-08 | 深圳创维-Rgb电子有限公司 | Method, system and readable storage medium for scene adaptive adjustment of image and sound |
| CN111601145A (en) * | 2020-05-20 | 2020-08-28 | 腾讯科技(深圳)有限公司 | Content display method, device and equipment based on live broadcast and storage medium |
| CN111768269A (en) * | 2020-06-22 | 2020-10-13 | 中国建设银行股份有限公司 | A kind of interactive method, device and storage medium of panoramic image |
| CN111986595A (en) * | 2020-07-06 | 2020-11-24 | 佛山市京木测控科技有限公司 | Product information display method, electronic equipment and storage medium |
| CN112381038A (en) * | 2020-11-26 | 2021-02-19 | 中国船舶工业系统工程研究院 | Image-based text recognition method, system and medium |
| WO2021136363A1 (en) * | 2019-12-31 | 2021-07-08 | 阿里巴巴集团控股有限公司 | Video data processing and display methods and apparatuses, electronic device, and storage medium |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112602330B (en) * | 2019-10-29 | 2023-07-11 | 海信视像科技股份有限公司 | Electronic device and nonvolatile storage medium |
-
2022
- 2022-06-16 CN CN202210682504.0A patent/CN115243062B/en active Active
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2019057057A (en) * | 2017-09-20 | 2019-04-11 | 富士ゼロックス株式会社 | Information processing apparatus, information processing system, and program |
| CN108305626A (en) * | 2018-01-31 | 2018-07-20 | 百度在线网络技术(北京)有限公司 | The sound control method and device of application program |
| CN108920580A (en) * | 2018-06-25 | 2018-11-30 | 腾讯科技(深圳)有限公司 | Image matching method, device, storage medium and terminal |
| TW202013981A (en) * | 2018-09-26 | 2020-04-01 | 英屬維爾京群島商創意點子數位股份有限公司(Bvi) | Multimedia push notification method and interaction device thereof capable of enhancing interactive effect during push notifications, simplifying setting procedure for push notifications, and maintaining smoothness and comfort during watching video |
| CN110471599A (en) * | 2019-08-14 | 2019-11-19 | 广东小天才科技有限公司 | Screen word-selecting searching method, device, electronic equipment and storage medium |
| CN110881134A (en) * | 2019-11-01 | 2020-03-13 | 北京达佳互联信息技术有限公司 | Data processing method and device, electronic equipment and storage medium |
| CN111131889A (en) * | 2019-12-31 | 2020-05-08 | 深圳创维-Rgb电子有限公司 | Method, system and readable storage medium for scene adaptive adjustment of image and sound |
| WO2021136363A1 (en) * | 2019-12-31 | 2021-07-08 | 阿里巴巴集团控股有限公司 | Video data processing and display methods and apparatuses, electronic device, and storage medium |
| CN111601145A (en) * | 2020-05-20 | 2020-08-28 | 腾讯科技(深圳)有限公司 | Content display method, device and equipment based on live broadcast and storage medium |
| CN111768269A (en) * | 2020-06-22 | 2020-10-13 | 中国建设银行股份有限公司 | A kind of interactive method, device and storage medium of panoramic image |
| CN111986595A (en) * | 2020-07-06 | 2020-11-24 | 佛山市京木测控科技有限公司 | Product information display method, electronic equipment and storage medium |
| CN112381038A (en) * | 2020-11-26 | 2021-02-19 | 中国船舶工业系统工程研究院 | Image-based text recognition method, system and medium |
Non-Patent Citations (1)
| Title |
|---|
| 红外图像自动目标识别技术进展;王自勇,廖朝佩;飞航导弹;19960720(第07期);全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115243062A (en) | 2022-10-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113127708B (en) | Information interaction method, device, equipment and storage medium | |
| US11983500B2 (en) | Method and device for semantic analysis and storage medium | |
| CN107885826B (en) | Multimedia file playing method and device, storage medium and electronic equipment | |
| CN113792207A (en) | Cross-modal retrieval method based on multi-level feature representation alignment | |
| CN109165292A (en) | Data processing method and device and mobile terminal | |
| CN105975557B (en) | Topic searching method and device applied to electronic equipment | |
| CN112417095B (en) | Voice message processing method and device | |
| CN108768824B (en) | Information processing method and device | |
| EP3734472A1 (en) | Method and device for text processing | |
| CN109471919B (en) | Zero pronoun resolution method and device | |
| CN113055529A (en) | Recording control method and recording control device | |
| CN111538830B (en) | French searching method, device, computer equipment and storage medium | |
| CN117370586A (en) | Information display method and device, electronic equipment and storage medium | |
| CN113936697B (en) | Voice processing method and device for voice processing | |
| CN115309487A (en) | Display method, display device, electronic equipment and readable storage medium | |
| CN115243062B (en) | Scene display method and device, screen display device, electronic device and storage medium | |
| CN112987941B (en) | Method and device for generating candidate words | |
| CN111542817A (en) | Information processing device, video search method, generation method, and program | |
| CN113869063A (en) | Data recommendation method and device, electronic equipment and storage medium | |
| CN111198926B (en) | Business handling management method and device, electronic equipment and storage medium | |
| TW201133359A (en) | Character recognition system and method for the same | |
| WO2024149183A1 (en) | Document display method and apparatus, and electronic device | |
| CN112765447B (en) | Data searching method and device and electronic equipment | |
| CN111968686B (en) | Recording method and device and electronic equipment | |
| CN113157966B (en) | Display method, device and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant |