CN113918803B - Live broadcasting room searching method and device, server and storage medium - Google Patents
Live broadcasting room searching method and device, server and storage mediumInfo
- Publication number
- CN113918803B CN113918803B CN202010644565.9A CN202010644565A CN113918803B CN 113918803 B CN113918803 B CN 113918803B CN 202010644565 A CN202010644565 A CN 202010644565A CN 113918803 B CN113918803 B CN 113918803B
- Authority
- CN
- China
- Prior art keywords
- live
- room
- stream
- identifier
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The method comprises the steps of obtaining input text, carrying out intention analysis on the input text to obtain at least one search intention, and if the search intention comprises the live search intention, retrieving from live index data corresponding to the live search intention to obtain a target live room identifier, wherein the live index data comprises a static data stream and a dynamic data stream which are combined and correspond to the target live room identifier. According to the method, the live broadcast index data is constructed according to the dynamic data stream and the static data stream corresponding to the live broadcast room identification, and the searching accuracy of the live broadcast room can be effectively improved by utilizing the characteristic of real-time dynamic change of the live broadcast room data, so that the searching amount of a user is greatly improved.
Description
Technical Field
The disclosure relates to the technical field of internet, and in particular relates to a search method, a search device, a search server and a search storage medium for a live broadcast room.
Background
With the development of live broadcast technology, the number of live rooms in various live broadcast applications is rapidly increasing. In order for a user to find a live room of interest from a vast number of live rooms, many live applications provide a search portal for the live room.
In the related art, a live search typically builds an index database based on static data such as anchor related information and live room tags. The index database is used for responding to a search request of the live broadcasting room, which is provided by a user, and providing information of the live broadcasting room which the user wants to acquire. When the input text of the audience client is acquired, a live broadcast room matched with the input text is queried from the index database according to the input text, and the matched live broadcast room is sent to the audience client. However, due to the dynamic change characteristics of the data in the live broadcast room, the search mode of the live broadcast room is adopted to solve the problem of inaccurate search.
Disclosure of Invention
The disclosure provides a method, a device, a server and a storage medium for searching between broadcasting, so as to at least solve the problem that searching between broadcasting is not accurate enough in the related art. The technical scheme of the present disclosure is as follows:
according to a first aspect of an embodiment of the present disclosure, there is provided a search method for a live broadcast room, including:
Acquiring an input text;
performing intent analysis on the input text to obtain at least one search intent;
If the search intention comprises a live broadcast search intention, a target live broadcast room identification is retrieved from live broadcast index data corresponding to the live broadcast search intention, and the live broadcast index data comprises a static data stream and a dynamic data stream which are combined and correspond to the target live broadcast room identification.
In one embodiment, if the search intention includes a live search intention, retrieving the target live room identifier from live index data corresponding to the live search intention includes:
If the search intention comprises a live broadcast search intention, matching an input text with word segmentation units in an index data block constructed according to live broadcast index data to obtain a target word segmentation unit matched with the input text, and performing word segmentation processing on the word segmentation unit according to the live broadcast index data;
And taking the live broadcasting room identification corresponding to the target word segmentation unit as a target live broadcasting room identification.
In one embodiment, if the search intention includes a live search intention, retrieving the target live room identifier from live index data corresponding to the live search intention includes:
If the search intention comprises a live broadcast search intention, retrieving an original live broadcast room identifier from live broadcast index data according to the input text;
Acquiring live broadcasting room association information corresponding to the original live broadcasting room identification;
And filtering the live broadcasting room association information to obtain an original live broadcasting room identifier corresponding to the filtered live broadcasting room association information, and taking the original live broadcasting room identifier as a target live broadcasting room identifier.
In one embodiment, if the search intention includes a live search intention, after retrieving the target live room identifier from live index data corresponding to the live search intention, the method further includes:
And when the number of the target live broadcasting room identifiers is multiple, sequencing the target live broadcasting room identifiers by adopting a pre-configured sequencing model to obtain sequenced target live broadcasting room identifiers.
In one embodiment, the generating manner of the live index data includes:
Receiving multiple paths of data streams, wherein the data streams carry live broadcasting room identifications, and the data streams comprise static data streams and dynamic data streams;
and merging the static data stream and the dynamic data stream corresponding to the same live broadcasting room identifier to obtain the live broadcasting index data corresponding to the live broadcasting room identifier.
In one embodiment, after receiving the multiple data streams, the method further comprises:
Cleaning the static data stream and the dynamic data stream to obtain a cleaned static data stream and a cleaned dynamic data stream;
In this embodiment, merging the static data stream and the dynamic data stream corresponding to the same live broadcast room identifier to obtain live broadcast index data corresponding to the live broadcast room identifier includes:
And merging the cleaned static data stream and the cleaned dynamic data stream corresponding to the same live broadcasting room identifier to obtain live broadcasting index data corresponding to the live broadcasting room identifier.
In one embodiment, merging the static data stream and the dynamic data stream corresponding to the same live broadcast room identifier to obtain live broadcast index data corresponding to the live broadcast room identifier, including:
acquiring static data streams in multiple paths of data streams received when preset updating time arrives;
and merging the received dynamic data stream and the static data stream when the preset updating time is reached to obtain the live broadcast index data corresponding to the identification of the live broadcast room.
In one embodiment, after merging the static data stream and the dynamic data stream corresponding to the same live broadcast room identifier to obtain the live broadcast index data corresponding to the live broadcast room identifier, the method further includes:
Acquiring the number of a plurality of index servers;
Converting the identification of the live broadcasting room to obtain a corresponding identification value;
according to the number of the index servers, carrying out modulo processing on the identification value to obtain a data distribution identification;
and transmitting the live broadcast index data to an index server corresponding to the data distribution identification.
In one embodiment, if the search intention includes a live search intention, after retrieving the target live room identifier from live index data corresponding to the live search intention, the method further includes:
generating a live broadcast room list according to the target live broadcast room identification;
And sending a live broadcasting room list to the audience client, wherein the live broadcasting room list is used for indicating the audience client to display relevant information of the live broadcasting room corresponding to the target live broadcasting room identification in the page.
According to a second aspect of the embodiments of the present disclosure, there is provided a search apparatus for a live broadcast room, including:
an acquisition module configured to perform acquiring an input text;
The intention analysis module is configured to perform intention analysis on the input text to obtain at least one search intention;
And the retrieval module is configured to execute the retrieval of the target live broadcasting room identification from the live broadcasting index data corresponding to the live broadcasting search intention if the search intention comprises the live broadcasting search intention, wherein the live broadcasting index data comprises the combined static data stream and dynamic data stream corresponding to the target live broadcasting room identification.
In one embodiment, the search module is configured to perform matching of an input text with word segmentation units in an index data block constructed according to live index data when the search intention includes a live search intention, obtain a target word segmentation unit matched with the input text, and perform word segmentation processing according to the live index data to obtain the word segmentation unit;
And taking the live broadcasting room identification corresponding to the target word segmentation unit as a target live broadcasting room identification.
In one embodiment, the retrieval module comprises:
The searching unit is configured to execute the searching process of obtaining the original live broadcast room identification from the live broadcast index data according to the input text if the searching intention comprises the live broadcast searching intention;
the related information acquisition unit is configured to acquire the related information of the live broadcasting room corresponding to the original identification of the live broadcasting room;
the filtering unit is configured to filter the live broadcasting room association information, and obtain an original live broadcasting room identifier corresponding to the filtered live broadcasting room association information as a target live broadcasting room identifier.
In one embodiment, the apparatus further comprises:
And the ordering module is configured to order the target live broadcasting room identifications by adopting a pre-configured ordering model when the number of the target live broadcasting room identifications is a plurality of, so as to obtain the ordered target live broadcasting room identifications.
In one embodiment, the apparatus further comprises:
The receiving module is configured to receive multiple paths of data streams, wherein the data streams carry live broadcast room identifications, and the data streams comprise static data streams and dynamic data streams;
And the merging module is configured to merge the static data stream and the dynamic data stream corresponding to the same live broadcasting room identifier to obtain live broadcasting index data corresponding to the live broadcasting room identifier.
In one embodiment, the apparatus further comprises:
The data cleaning module is configured to perform cleaning processing on the static data stream and the dynamic data stream to obtain a cleaned static data stream and a cleaned dynamic data stream;
And the merging module is configured to merge the cleaned static data stream and the cleaned dynamic data stream corresponding to the same live broadcasting room identifier to obtain live broadcasting index data corresponding to the live broadcasting room identifier.
In one embodiment, a combining module includes:
the static data stream determining unit is configured to execute the acquisition of the static data stream in the multiple data streams received when the preset updating time arrives;
and the merging unit is configured to merge the received dynamic data stream and the static data stream when the preset updating time arrives to obtain the live broadcast index data corresponding to the identification of the live broadcast room.
In one embodiment, the obtaining module is further configured to perform obtaining a number of the plurality of index servers;
the device also comprises a conversion module, a display module and a display module, wherein the conversion module is configured to perform conversion processing on the identifier of the live broadcasting room to obtain a corresponding identifier value;
The data distribution identification determining module is configured to perform modulo processing on the identification values according to the number of the plurality of index servers to obtain data distribution identifications;
and the sending module is configured to send the live index data to an index server corresponding to the data distribution identification.
In one embodiment, the apparatus further comprises:
a live room list generation module configured to perform generation of a live room list from the target live room identification;
and the sending module is configured to send a live room list to the audience client, wherein the live room list is used for indicating the audience client to display the live room related information corresponding to the target live room identification in the page.
According to a third aspect of embodiments of the present disclosure, there is provided a server comprising:
A processor;
A memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the method of searching for a live room as described in any of the embodiments of the first aspect.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, which when executed by a processor of a server, enables the server to perform the method of searching for a live room described in any one of the embodiments of the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, causing the device to perform the method of searching for a living room as described in any one of the embodiments of the first aspect.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
According to the method, the device and the system, the input text of a client is subjected to intention analysis by introducing a multi-path search strategy framework to obtain at least one search intention, index data corresponding to each search intention are searched for each search intention, so that the accuracy of searching in a living broadcast room is improved based on multi-mode information such as user preference, space-time characteristics, context, interaction and the like, the multi-path search strategy framework is adopted, an algorithm can be supported to update strategies concurrently, the updating efficiency is improved, and aiming at the living broadcast search intention, live index data are built according to dynamic data flow and static data flow corresponding to living broadcast room identification, and the accuracy of searching in the living broadcast room can be effectively improved by utilizing the characteristic of real-time dynamic change of the data in the living broadcast room, so that the searching amount of users is greatly improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.
Fig. 1 is an application environment diagram illustrating a search method of a live room according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a search method of a live room according to an exemplary embodiment.
Fig. 3 is a flowchart illustrating a process for obtaining a target live room identification according to an exemplary embodiment.
Fig. 4 is a flow chart illustrating a method of generating live index data according to an exemplary embodiment.
Fig. 5 is a flowchart illustrating a step of transmitting live index data according to an exemplary embodiment.
Fig. 6 is a flowchart illustrating a search method of a live room according to an exemplary embodiment.
Fig. 7 is an index structure diagram illustrating a live room search according to an exemplary embodiment.
Fig. 8 is a schematic diagram illustrating logical slicing of live index data according to an example embodiment.
Fig. 9 is a block diagram illustrating a search apparatus of a live room according to an exemplary embodiment.
Fig. 10 is an internal structural diagram of a server shown according to an exemplary embodiment.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The search method of the live broadcasting room provided by the disclosure can be applied to an application environment shown in fig. 1. Wherein, the audience terminals 110 and the servers 120 communicate via a network, and the anchor terminals 130 and the servers 120 communicate via a network. The viewer terminal 110 has installed therein an application program that can be used to watch live broadcast, through which the viewer can conduct a live broadcast room search. The anchor 130 has installed therein an application program that can be used for live broadcasting. The application installed in the viewer side 110 for viewing live broadcast may be the same application as the application installed in the anchor side 130 for conducting live broadcast. During the live broadcast process of the anchor 130, the server 120 receives the multi-path data stream carrying the identifier of the live broadcast room sent by the anchor 130. The multiple data streams include a static data stream and a dynamic data stream. The server 120 performs merging processing on the static data stream and the dynamic data stream corresponding to the live-broadcast room identifier, and generates live-broadcast index data corresponding to the live-broadcast room identifier. After receiving the input text for live search sent by the viewer terminal 110, the server 120 performs intent analysis on the input text to obtain at least one search intent, and if the search intent includes a live search intent, retrieves a target live room identifier from live index data corresponding to the live search intent. The audience 110 may be, but not limited to, various personal computers, notebook computers, smartphones, and tablet computers, the server 120 may be implemented by a stand-alone server or a server cluster formed by a plurality of servers, and the anchor 130 may be, but not limited to, various personal computers, notebook computers, smartphones, and tablet computers.
Fig. 2 is a flowchart illustrating a search method of a live room according to an exemplary embodiment, and as shown in fig. 2, the search method of the live room is used in the server 120, including the following steps.
In step S210, an input text is acquired.
In step S220, intent analysis is performed on the input text to obtain at least one search intent.
Specifically, when a viewer needs to search for a live room, text is input through a search portal on the viewer's side, for example, "food live room". And the audience terminal sends a search request of the live broadcasting room to the server, wherein the search request carries the input text. And the server responds to the received search request, and performs intention analysis on the input text carried in the search request to obtain at least one search intention. The intent analysis of the input text may employ a rule template-based classification method or a machine learning-based classification method. For example, if a classification method based on machine learning is adopted, different search intention categories, such as a live tag search intention, a game live search intention, a live search intention, and the like, may be defined in advance for characteristics of a live room. Then, a commonly used word corresponding to each intention category is defined. And for the acquired input text, calculating the probability of each intention category according to a predefined common word through a classification model based on machine learning, and acquiring the search intention corresponding to the intention category with the probability larger than a threshold value.
In step S230, if the search intention includes a live search intention, the target live room identifier is retrieved from live index data corresponding to the live search intention, where the live index data includes a static data stream and a dynamic data stream that are combined and correspond to the target live room identifier.
Wherein the search intent includes at least a live search intent. The live search intention refers to an intention to search from a live room in the open according to an input text. The target live room identification refers to the retrieved live room identification that matches the input text. The live room identification is used to uniquely distinguish between different live rooms, and may be, for example, a anchor ID, anchor name, etc. Static data streams refer to data that does not change over a period of time during a live broadcast, such as live broadcast room anchor information, live broadcast room titles, live broadcast room ACUs (Average concurrent users, average simultaneous online user numbers), and the like. Dynamic data streams refer to data that changes in real time during live broadcast, such as voice data streams, comment data streams, live video data streams, and the like.
Specifically, a multi-path search strategy architecture is constructed for live-room searches in advance. Each path of searching strategy corresponds to one searching intention, and each searching intention corresponds to specific index data. For the live search intention, the corresponding live index data is generated in real time according to the multipath data streams sent by the main broadcasting end in the broadcasting process. And if at least one search intention obtained by carrying out intention analysis on the input text comprises a live broadcast search intention, retrieving a target live broadcast room identification matched with the input text from live broadcast index data corresponding to the live broadcast search intention.
According to the method, the input text of the client is subjected to intention analysis by introducing a multi-path search strategy framework to obtain at least one search intention, index data corresponding to each search intention are searched for each search intention, so that the accuracy of searching in the live broadcasting room is improved based on multi-mode information such as user preference, space-time characteristics, context, interaction and the like, the multi-path search strategy framework is adopted, the algorithm can be supported to update strategies concurrently, the updating efficiency is improved, and aiming at the live broadcasting search intention, live broadcasting index data are constructed according to dynamic data streams and static data streams corresponding to the identification of the live broadcasting room, and the accuracy of searching in the live broadcasting room can be effectively improved by utilizing the characteristic of real-time dynamic change of the data in the live broadcasting room, so that the searching amount of users is greatly improved.
In an exemplary embodiment, in step S230, if the search intention includes a live search intention, a target live room identifier is retrieved from live index data corresponding to the live search intention, including matching an input text with a word segmentation unit in an index data block constructed according to the live search intention, obtaining a target word segmentation unit matched with the input text, performing word segmentation processing on the word segmentation unit according to the live index data, and using the live room identifier corresponding to the target word segmentation unit as the target live room identifier.
Specifically, in order to improve the search efficiency of the live broadcast room search, in this embodiment, the live broadcast index data is logically fragmented in advance. Logical slicing refers to dividing live index data into a plurality of smaller index data blocks. Firstly, after generating live index data according to a static data stream and a dynamic data stream sent by a main broadcasting end, performing language processing and lexical analysis on the live index data through a word segmentation algorithm to obtain a plurality of word segmentation units (token). The word segmentation algorithm can adopt word segmentation methods based on character string matching, word sense word segmentation methods, statistical word segmentation methods and the like. Then, an index data block (chunk) is formed by a preset number of word segmentation units. The preset number may be 1024. And finally, forming a segmented live index word stock through the obtained index data blocks. After receiving the input text, matching the input text with word segmentation units in the index data block to obtain a target word segmentation unit matched with the input text. And searching a live broadcasting room identifier corresponding to the target word segmentation unit as a target live broadcasting room identifier.
In this embodiment, through performing logic slicing processing on the live index data, the word segmentation units in the input text and the index data block are matched, and compared with the live index data which is not sliced, fewer word segmentation units can be traversed, so that the search response time is shortened, the search result is obtained more quickly, and the search efficiency is improved efficiently.
In an exemplary embodiment, as shown in fig. 3, if the search intention includes a live search intention in step S230, the target live room identifier is retrieved from live index data corresponding to the live search intention, which may be implemented specifically by the following steps.
In step S231, if the search intention includes a live search intention, the original live room identifier is retrieved from the live index data according to the input text.
In step S232, live room association information corresponding to the original live room identifier is acquired.
In step S233, the live broadcast room association information is filtered, and an original live broadcast room identifier corresponding to the filtered live broadcast room association information is obtained as a target live broadcast room identifier.
Wherein the original live room identification refers to an unprocessed live room identification retrieved from the live index data. Specifically, after receiving the input text, the live room identifier retrieved from the live index data according to the input text is used as the original live room identifier. There may be high risk live rooms in the corresponding live room of the original live room identification, e.g., illegal live rooms, sensitive topic live rooms, etc. Therefore, the original live broadcasting room identification needs to be filtered, and the filtered original live broadcasting room identification is used as the target inter-index identification.
Filtering the original live room identity is not limited to employing pre-deployed wind control policies, blacklist filtering, and manual intervention. Illustratively, the wind control policy may be based on pre-deployed multidimensional rules, such as audience scores corresponding to live room identifications, historical behavioral data corresponding to live room identifications, anchor features corresponding to live room identifications, and so forth. And if the live broadcasting room corresponding to the original live broadcasting room identification is judged to be the high-risk live broadcasting room by the wind control strategy, filtering the original live broadcasting room identification.
In the embodiment, the method adopts various data filtering modes to filter the retrieved original live broadcasting room identification based on multiple dimensions, so that the information quantity of the live broadcasting room acquired by the audience can be reduced, the efficiency of searching the interested resources by the audience is improved, meanwhile, the original direct room identification is filtered, the live broadcasting room identification meeting the preset requirements is screened out and pushed, and the reliability of the live broadcasting room information acquired by the audience can be ensured.
In an exemplary embodiment, in step S230, if the search intention includes a live search intention, after retrieving the target live room identifier from the live index data corresponding to the live search intention, the method further includes, when the number of target live room identifiers is plural, sorting the target live room identifiers by using a pre-configured sorting model, and obtaining the sorted target live room identifiers.
Specifically, since the live broadcast application generally transmits the live broadcast room related information corresponding to the retrieved target live broadcast room identifier to the viewer end in a list manner. Therefore, in order to enable the audience to quickly acquire the related information of the live broadcasting room which is matched with the input text, after the target live broadcasting room identification is obtained, the target live broadcasting room identification is sequenced through a pre-configured sequencing model, and the sequenced target live broadcasting room identification is obtained. In this embodiment, since the search in the live broadcasting room is implemented through the multi-path search policy architecture, the priority of each search intention may be preconfigured. And if the intention analysis is carried out on the input text to obtain a plurality of search intents, sequencing the target live broadcasting room identification according to the priority of each search intention. And for the target live broadcast room identification corresponding to the same search intention, the target live broadcast room identification can be ordered according to the relevant information of the live broadcast room, the search log and the like through a pre-deployed ordering model. The related information of the living broadcast room is not limited to the number of the online people, the retention time of the audience, the number of comments and the living broadcast content (such as hot topic living broadcast). And calculating a sequencing value corresponding to each target live broadcasting room identifier through a sequencing model, and sequencing the target live broadcasting room identifiers corresponding to the same search intention.
In this embodiment, by sorting the retrieved target live broadcast room identifiers, the audience can quickly acquire relevant information of the live broadcast room that is relatively matched with the input text, so that the time of selecting the live broadcast room to be watched by the audience can be reduced, the watching time of the audience in the live broadcast room can be prolonged, and further the retention rate of the audience can be improved.
In an exemplary embodiment, as shown in fig. 4, a description is given of a generation manner of live index data, which includes the following steps.
In step S410, a multi-way data stream is received, the data stream carrying a live room identification, the data stream comprising a static data stream and a dynamic data stream.
In step S420, the static data stream and the dynamic data stream corresponding to the same live broadcast room identifier are combined to obtain live broadcast index data corresponding to the live broadcast room identifier.
Specifically, the search in the live broadcast room has the characteristics of real-time performance and state. The state refers to the state of the living room, including an on-air state and an off-air state. The search of the live broadcast room aims at the live broadcast room in the on-air state, and the data flow of the live broadcast room is updated in real time along with the progress of live broadcast. And after receiving the multi-path data stream carrying the live broadcasting room identification, the server judges whether the current live broadcasting room state corresponding to the live broadcasting room identification is an on-stream state, if so, the multi-path data stream is received, and if not, the multi-path data stream is discarded. Each path of data flow received by the server carries a corresponding data type, and the data type comprises dynamic data and static data. And taking the data stream with the data type of dynamic data as a dynamic data stream and the data stream with the data type of static data as a static data stream. And merging the static data stream and the dynamic data stream corresponding to the live broadcasting room identification to form live broadcasting index data corresponding to the live broadcasting room identification.
In the embodiment, the live broadcast index data is obtained by combining the dynamic data stream and the static data stream corresponding to the live broadcast room identification, and the searching accuracy of the live broadcast room can be effectively improved by utilizing the stateful characteristic and the real-time dynamic change characteristic of the live broadcast room data, so that the searching amount of a user is greatly improved.
In an exemplary embodiment, after receiving the multiple data streams, step S410 further includes performing a cleaning process on the static data stream and the dynamic data stream to obtain a cleaned static data stream and a cleaned dynamic data stream. In the embodiment, merging the static data stream and the dynamic data stream corresponding to the same live broadcasting room identifier to obtain live broadcasting index data corresponding to the live broadcasting room identifier comprises merging the cleaned static data stream and the cleaned dynamic data stream corresponding to the same live broadcasting room identifier to obtain the live broadcasting index data corresponding to the live broadcasting room identifier.
Specifically, since there is a lot of noise data, such as sensitive words, nonsensical words (e.g., o) and the like, in the live broadcast data received by the server, in order to improve accuracy of the live broadcast index data, after receiving the multiple data streams, the static data stream and the dynamic data stream may be cleaned, so as to obtain a cleaned static data stream and a cleaned dynamic data stream. The cleaning process may be performed by reference factors of predefined data cleaning, such as, for example, anchor feature data, a title of the living room, an online population of the living room, scoring of the living room, a pre-set dictionary, etc. And cleaning out the live broadcasting room data meeting the preset requirements from the received multipath data streams according to the reference factors.
In this embodiment, after the received multiple data streams, the noise data in the multiple data streams is cleaned, so that on one hand, the reduction of the data volume is helpful to improve the generation efficiency of the live index data and reduce the operation pressure of the server after the data cleaning process, and on the other hand, the quality of the index data can be improved by cleaning the data, so as to help to improve the accuracy of the live index data.
In an exemplary embodiment, in step S420, the merging of the static data stream and the dynamic data stream corresponding to the same live-broadcast room identifier to obtain the live-broadcast index data corresponding to the live-broadcast room identifier may specifically include obtaining the static data stream in the multiple data streams received when the preset update time arrives, merging the received dynamic data stream and the static data stream when the preset update time arrives to obtain the live-broadcast index data corresponding to the live-broadcast room identifier.
Specifically, in order to improve the efficiency of generating live index data, different update policies are adopted for the dynamic data stream and the static data stream in the present implementation. Since the static data stream is a data stream that is not updated for a certain period of time, a corresponding update time is preconfigured for the static data, and the update time may be a periodic time. After receiving the multipath data streams, the server judges whether the current time reaches the update time of the static data streams, if not, the static data streams are kept unchanged, and the current received dynamic data streams and the stored static data streams are combined to obtain live index data. If the current time is judged to have reached the update time of the static data stream, merging the static data stream and the dynamic data stream which are received currently to obtain live index data.
In this embodiment, by using the time update characteristics of the dynamic data stream and the static data stream, corresponding update policies are configured for the dynamic data stream and the static data stream, so that the data processing efficiency can be improved, and the operating pressure of the server can be reduced.
In an exemplary embodiment, as shown in fig. 5, after merging the static data stream and the dynamic data stream corresponding to the same live-broadcast room identifier, the following steps are further included.
In step S510, the number of the plurality of index servers is acquired.
In step S520, the live broadcast room identifier is converted to obtain a corresponding identifier value.
In step S530, the identification number is modulo processed according to the number of the plurality of index servers, to obtain a data distribution identification.
In step S540, the live index data is transmitted to an index server corresponding to the data distribution identification.
Specifically, after the live index data is obtained, physical slicing processing is performed on the live index data. Physical sharding refers to distributing live index data across multiple individual index server nodes. The physical slicing process may adopt a hash-based slicing manner, a data range-based slicing manner, or the like. In this embodiment, a hash-based slicing scheme is adopted. The hash-based slicing scheme in this embodiment is described below. First, the number of a plurality of index servers is acquired. And converting the identifier of the live broadcasting room by adopting a hash algorithm, and taking the obtained hash value as an identifier value. And then, carrying out modulo processing on the identification values according to the number of the index servers to obtain the data distribution identification. The data distribution identification is used for uniquely distinguishing the index server, and the corresponding relation between the data distribution identification and the index server is preset. And finally, searching an index server corresponding to the live broadcasting room identifier from the corresponding relation between the data distribution identifier and the index server according to the obtained data distribution identifier, and sending the live broadcasting index data corresponding to the live broadcasting room identifier to the index server.
In the embodiment, the live broadcast index data corresponding to the plurality of live broadcast room identifications are uniformly distributed to the plurality of index servers, and when part of index servers fail, normal live broadcast room search is not influenced, so that the reliability of live broadcast room search can be ensured, and by distributing the live broadcast index data to the plurality of index servers, the operation pressure of each index server can be reduced, the data retrieval time is shortened, and the search efficiency of the live broadcast room is improved.
In an exemplary embodiment, after retrieving the target live room identifier from the live index data corresponding to the live search intention if the search intention includes the live search intention, the method further includes generating a live room list according to the target live room identifier, and sending the live room list to the audience client, where the live room list is used to instruct the audience client to display related information of the live room corresponding to the target live room identifier in the page.
The related information of the living broadcast room comprises information displayed on a client interface of a viewer and used for distinguishing different living broadcast rooms, for example, the related information can be a main broadcasting head image of the living broadcast room, a living broadcast picture of the living broadcast room and the like. Specifically, after acquiring the target live broadcasting room identifier, the server acquires the related information of the live broadcasting room corresponding to the live broadcasting room identifier. The server combines the obtained target live broadcasting room identification and the corresponding live broadcasting room related information into a live broadcasting room list, and sends the live broadcasting room list to the audience client side, so that the audience client side can display the corresponding live broadcasting room related information of each live broadcasting room in the live broadcasting recommendation list at a preset position of the current interface.
Fig. 6 is a flowchart illustrating a search method of a live room according to an exemplary embodiment, and fig. 7 is an index structure diagram illustrating a search of a live room according to an exemplary embodiment. The following describes the search method of the live broadcasting room in detail with reference to fig. 6 and 7, including the following steps.
In step S601, a multi-path data stream is received, the data stream carrying a live room identifier, the data stream comprising a static data stream and a dynamic data stream.
Specifically, the received multipath data stream is a data stream of which the server judges that the live broadcasting room state is the on-air state according to the live broadcasting room identification. And configuring corresponding data types for each data stream in advance, wherein the data types comprise dynamic data and static data. And taking the data stream with the data type of static data as a static data stream and the data stream with the data type of dynamic data as a dynamic data stream. As shown in fig. 7, the live room video data stream, the game live room video data stream, the ASR (Automatic Speech Recognition, automatic speech recognition technology) data stream and the comment data stream are dynamic data streams, and the information about the anchor, the live title and the like are static data streams.
In step S602, a dynamic data stream and a static data stream to be combined are determined.
Specifically, according to an updating strategy configured for data type dynamic data and static data in advance, corresponding static data stream and dynamic data stream are obtained. For dynamic data flows, a real-time update strategy is adopted, and for static data flows, a periodic update strategy is adopted. That is, after receiving the multiple data streams, it is determined whether the current time reaches the update time of the static data stream, and if not, the static data stream is kept unchanged. And if the current time is judged to have reached the updating time of the static data stream, updating the stored static data stream into the currently received static data stream.
In step S603, the static data stream and the dynamic data stream are subjected to a cleaning process, so as to obtain a cleaned static data stream and a cleaned dynamic data stream.
In step S604, the cleaned static data stream and the cleaned dynamic data stream corresponding to the same live broadcast room identifier are combined to obtain live broadcast index data corresponding to the live broadcast room identifier.
In step S605, live index data is transmitted to an index server.
Specifically, the live index data is subjected to physical slicing processing, and the live index data is sent to a plurality of independent index server nodes. The physical slicing process may adopt a hash-based slicing process, and specific reference may be made to the description of the above embodiments, which is not specifically described herein.
In step S606, the live index data in the index server is logically sliced.
Specifically, word segmentation processing is carried out on live broadcast index data to obtain a plurality of word segmentation units token, and every 1024 word segmentation units form an index data block. Fig. 8 schematically shows obtaining a plurality of index data blocks from a plurality of word segmentation units.
In step S607, an input text is acquired.
In step S608, an intention analysis is performed on the input text, resulting in at least one search intention.
In step S609, if the search intention includes a live search intention, the original live room identification is retrieved from live index data corresponding to the live search intention.
Wherein, as shown in fig. 7, the search intention includes a manual dry preset top search intention, a member user query search intention, a game live search intention, a live tag search intention, a hot live search intention, a live recommendation intention, a live search intention. The game live broadcast searching intention, the live broadcast label searching intention, the hot live broadcast searching intention and the live broadcast searching intention respectively correspond to independent index data. After obtaining at least one search intention, retrieving from index data corresponding to the search intention.
If the search intention includes a live broadcast search intention, retrieving from the live broadcast index data generated in the steps S601 to S609 to obtain an original live broadcast room identifier. The method comprises the steps of matching an input text with word segmentation units in a data block generated according to live broadcast index data, obtaining a target word segmentation unit matched with the input text, and taking a live broadcast room identifier corresponding to the target word segmentation unit as an original live broadcast room identifier.
Further, if the search intention obtained by the intention analysis includes other live search intents, the original live room identifier corresponding to the other live search intents may be obtained by:
and the manual dry preset top searching intention is completed through manual operation of a manual operation platform.
For member users to inquire the searching intention, the direct broadcast room identification with the inquiring state of the direct broadcast room identification of the member direct broadcast room can be used for broadcasting according to the member direct broadcast room identification of the member direct broadcast room. And searching the original live broadcasting room identification matched with the input text according to the related information of the live broadcasting room corresponding to the queried live broadcasting room identification, such as the title, the main broadcasting feature and the like of the live broadcasting room.
And for the live game search intention, acquiring the tag data corresponding to the live game room identification, and constructing live game index data according to the tag data corresponding to the live game room identification. And matching the input text with the game live index data to obtain an original live broadcast room identifier corresponding to the game live broadcast searching intention. Tag data corresponding to the identification of the game live broadcast room can be obtained by analyzing the data flow of the game live broadcast room through a pre-configured tag model.
For the live broadcast tag search intention, tag data corresponding to the live broadcast room identification can be obtained, and live broadcast tag index data is constructed according to the tag data corresponding to the live broadcast room identification. And matching the input text with the live broadcast tag index data to obtain an original live broadcast room identifier corresponding to the live broadcast tag search intention. Tag data corresponding to the identification of the live broadcast room can be obtained by analyzing the data stream of the live broadcast room through a pre-configured tag model.
And constructing the anchor portrait feature index data corresponding to the hot live broadcast search intention through the anchor portrait features. And matching the input text with the main broadcast portrait characteristic index data to obtain an original live broadcast room identifier corresponding to the hot live broadcast search intention. The hot live broadcast identification can be obtained according to relevant information of a live broadcast room corresponding to the live broadcast room identification in which the state is on, wherein the relevant information of the live broadcast room comprises the number of audience members, the number of audience comments and the like.
In step S610, the live broadcasting room status of the original live broadcasting room identifier is obtained, and the original live broadcasting room identifier whose status is the on-air status is obtained by filtering.
In step S611, the original live broadcast room identifier in the on-air state is filtered, and the filtered original live broadcast room identifier is obtained as the target live broadcast room identifier. Filtering the original live room identity is not limited to employing pre-deployed wind control policies, blacklist filtering, and manual intervention.
In step S612, the target live broadcast room identifiers are ranked by using a preconfigured ranking model, so as to obtain the ranked target live broadcast room identifiers.
In step S613, a live room list is generated according to the target live room identification, and the live room list is transmitted to the viewer client.
It should be understood that, although the steps in the flowcharts of fig. 1-8 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in FIGS. 1-8 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.
Fig. 10 is a block diagram illustrating a search apparatus 900 of a live room according to an exemplary embodiment. Referring to fig. 9, the apparatus includes an acquisition module 901, an intent analysis module 902, and a retrieval module 903.
The system comprises an acquisition module 901 configured to acquire an input text, an intention analysis module 902 configured to perform intention analysis on the input text to obtain at least one search intention, and a retrieval module 903 configured to retrieve a target live room identifier from live index data corresponding to the live search intention if the search intention comprises the live search intention, wherein the live index data comprises a combined static data stream and dynamic data stream corresponding to the target live room identifier.
In an exemplary embodiment, the retrieval module 903 is configured to perform matching of an input text with a word segmentation unit in an index data block constructed according to live index data when the search intention includes a live search intention, obtain a target word segmentation unit matched with the input text, perform word segmentation processing on the word segmentation unit according to the live index data, and use a live room identifier corresponding to the target word segmentation unit as a target live room identifier.
In an exemplary embodiment, the retrieval module 903 includes a retrieval unit configured to perform retrieving, if the search intention includes a live search intention, an original live room identifier from live index data according to an input text, an association information obtaining unit configured to perform obtaining live room association information corresponding to the original live room identifier, and a filtering unit configured to perform filtering the live room association information, and obtain the original live room identifier corresponding to the filtered live room association information as a target live room identifier.
In an exemplary embodiment, the apparatus further includes a ranking module configured to perform ranking of the target live room identifications using a pre-configured ranking model when the number of target live room identifications is multiple, resulting in ranked target live room identifications.
In an exemplary embodiment, the device further comprises a receiving module configured to receive multiple data streams, wherein the data streams carry live broadcast room identifications, the data streams comprise static data streams and dynamic data streams, and a merging module configured to merge the static data streams and the dynamic data streams corresponding to the same live broadcast room identifications to obtain live broadcast index data corresponding to the live broadcast room identifications.
In an exemplary embodiment, the device further comprises a data cleansing module configured to cleansing the static data stream and the dynamic data stream to obtain a cleansed static data stream and a cleansed dynamic data stream, and a merging module configured to merge the cleansed static data stream and the cleansed dynamic data stream corresponding to the same live broadcasting room identifier to obtain live broadcasting index data corresponding to the live broadcasting room identifier.
In an exemplary embodiment, the merging module comprises a static data stream determining unit configured to perform static data streams in multiple paths of data streams received when a preset updating time is obtained, and a merging unit configured to perform merging of the received dynamic data streams and the static data streams when the preset updating time is obtained to obtain live broadcast index data corresponding to a live broadcast room identifier.
In an exemplary embodiment, the obtaining module 901 is further configured to obtain the number of the plurality of index servers, the apparatus further comprises a conversion module configured to perform conversion processing on the identifier of the living broadcast room to obtain a corresponding identifier value, a data distribution identifier determining module configured to perform modulo processing on the identifier value according to the number of the plurality of index servers to obtain a data distribution identifier, and a sending module configured to send the living broadcast index data to the index server corresponding to the data distribution identifier.
In an exemplary embodiment, the apparatus further comprises a live room list generation module configured to perform generation of a live room list according to the target live room identification, and a transmission module configured to perform transmission of the live room list to the viewer client, the live room list being used for instructing the viewer client to display live room related information corresponding to the target live room identification in a page.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 10 is a block diagram illustrating an apparatus 1000 for live room searching in accordance with an exemplary embodiment. For example, device 1000 may be a server. Referring to fig. 10, device 1000 includes a processing component 1020 that further includes one or more processors and memory resources represented by memory 1022 for storing instructions, such as applications, executable by processing component 1020. The application programs stored in memory 1022 may include one or more modules each corresponding to a set of instructions. Further, the processing component 1020 is configured to execute instructions to perform the search method of the live room described above.
The device 1000 may also include a power supply component 1024 configured to perform power management of the device 1000, a wired or wireless network interface 1026 configured to connect the device 1000 to a network, and an input output (I/O) interface 1028. The device 1000 may operate based on an operating system stored in the memory 1022, such as Windows Server, mac OS X, unix, linux, freeBSD, or the like.
In an exemplary embodiment, a storage medium is also provided, such as a memory 1022 including instructions executable by a processor of the device 1000 to perform the above-described method. The storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (21)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010644565.9A CN113918803B (en) | 2020-07-07 | 2020-07-07 | Live broadcasting room searching method and device, server and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010644565.9A CN113918803B (en) | 2020-07-07 | 2020-07-07 | Live broadcasting room searching method and device, server and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113918803A CN113918803A (en) | 2022-01-11 |
| CN113918803B true CN113918803B (en) | 2025-11-07 |
Family
ID=79231485
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010644565.9A Active CN113918803B (en) | 2020-07-07 | 2020-07-07 | Live broadcasting room searching method and device, server and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113918803B (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110390035A (en) * | 2019-07-26 | 2019-10-29 | 广州虎牙科技有限公司 | Searching method, device, equipment and the storage medium of direct broadcasting room |
| CN110909209A (en) * | 2019-11-26 | 2020-03-24 | 北京达佳互联信息技术有限公司 | Live video searching method and device, equipment, server and storage medium |
| CN111104583A (en) * | 2018-10-10 | 2020-05-05 | 武汉斗鱼网络科技有限公司 | Live broadcast room recommendation method, storage medium, electronic device and system |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9049259B2 (en) * | 2011-05-03 | 2015-06-02 | Onepatont Software Limited | System and method for dynamically providing visual action or activity news feed |
| GB201603685D0 (en) * | 2016-03-03 | 2016-04-20 | Mporium Group Plc | Identifying data to influence content or access to content |
| CN106096050A (en) * | 2016-06-29 | 2016-11-09 | 乐视控股(北京)有限公司 | A kind of method and apparatus of video contents search |
| CN106341695B (en) * | 2016-08-31 | 2020-08-11 | 腾讯数码(天津)有限公司 | Live broadcast room interaction method, device and system |
| CN108600775B (en) * | 2018-05-22 | 2020-11-17 | 广州虎牙信息科技有限公司 | Live video monitoring method and device, server and storage medium |
| US11163817B2 (en) * | 2018-05-24 | 2021-11-02 | Spotify Ab | Descriptive media content search |
| CN108769823B (en) * | 2018-05-28 | 2019-05-28 | 广州虎牙信息科技有限公司 | Direct broadcasting room display methods, device, equipment |
| CN109982128B (en) * | 2019-03-19 | 2020-11-03 | 腾讯科技(深圳)有限公司 | Video bullet screen generation method and device, storage medium and electronic device |
| CN110300313A (en) * | 2019-06-28 | 2019-10-01 | 广州酷狗计算机科技有限公司 | Information display method, device, terminal, server and storage medium |
-
2020
- 2020-07-07 CN CN202010644565.9A patent/CN113918803B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111104583A (en) * | 2018-10-10 | 2020-05-05 | 武汉斗鱼网络科技有限公司 | Live broadcast room recommendation method, storage medium, electronic device and system |
| CN110390035A (en) * | 2019-07-26 | 2019-10-29 | 广州虎牙科技有限公司 | Searching method, device, equipment and the storage medium of direct broadcasting room |
| CN110909209A (en) * | 2019-11-26 | 2020-03-24 | 北京达佳互联信息技术有限公司 | Live video searching method and device, equipment, server and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113918803A (en) | 2022-01-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10277696B2 (en) | Method and system for processing data used by creative users to create media content | |
| CN106331778B (en) | Video recommendation method and device | |
| JP6170023B2 (en) | Content recommendation device, content recommendation method, and content recommendation program | |
| US8595375B1 (en) | Segmenting video based on timestamps in comments | |
| US8457368B2 (en) | System and method of object recognition and database population for video indexing | |
| US8601076B2 (en) | Systems and methods for identifying and notifying users of electronic content based on biometric recognition | |
| US20140032562A1 (en) | Apparatus and methods for user generated content indexing | |
| CN108875022A (en) | A kind of video recommendation method and device | |
| US20170155939A1 (en) | Method and System for Processing Data Used By Creative Users to Create Media Content | |
| CN111104583A (en) | Live broadcast room recommendation method, storage medium, electronic device and system | |
| CN111327955A (en) | User portrait based on-demand method, storage medium and smart television | |
| KR20210006662A (en) | Animaiton contents resource service system and method based on intelligent informatin technology | |
| WO2024193216A1 (en) | Pushing object processing method, and training method and apparatus for object pushing model | |
| CN118861211A (en) | A multimodal data retrieval method and device based on metric index | |
| TWI709905B (en) | Data analysis method and data analysis system thereof | |
| CN114020960A (en) | Music recommendation method, device, server and storage medium | |
| CN113918803B (en) | Live broadcasting room searching method and device, server and storage medium | |
| CN114417890B (en) | Comment content reply method and device, electronic equipment and storage medium | |
| US9886415B1 (en) | Prioritized data transmission over networks | |
| US20200081922A1 (en) | Data analysis method and data analysis system thereof | |
| CN118095355A (en) | Model training method, content screening method and related device | |
| CN117493606A (en) | A video retrieval method, device, system, electronic equipment and storage medium | |
| CN116137677A (en) | Topic recommendation method for smart TV, computer equipment and readable storage medium | |
| CN114356979B (en) | A query method and related equipment | |
| CN115017166B (en) | Method and device for constructing vertical data, electronic equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TG01 | Patent term adjustment | ||
| TG01 | Patent term adjustment |