HK1160718B

HK1160718B - Media event structure and context identification using short messages

Info

Publication number: HK1160718B
Application number: HK12101013.7A
Authority: HK
Inventors: 大卫．艾曼．沙玛; 林登．肯尼迪; 伊丽莎白．F．丘吉尔
Original assignee: Pinterest, Inc.
Priority date: 2010-02-22
Filing date: 2012-02-03
Publication date: 2016-08-26

Description

Media event structure and context identification using short messages

Technical Field

The present disclosure relates to identifying a structure (structure) and/or context (context) of a media event (media event), such as a live media event, and more particularly to identifying a structure and/or context of a media event using short message content.

Background

Content, such as multimedia, audio, video, image, animation, interactive, etc., content, has become increasingly accessible to users. In addition, the number of accessible videos increases. By way of non-limiting example, the amount of video content that users may access via the internet or other computer networks has increased. Other types of multimedia content are media events, such as live media events.

Content such as content commemorating a media event can be quite lengthy. The user may only be interested in a portion of the event. Alternatively, the user may not know whether the user is interested in the content.

Disclosure of Invention

The present disclosure attempts to address the shortcomings of the prior art and provides systems, methods and architectures for media event segment identification and annotation using short message samples. Embodiments of the present disclosure utilize real-time discussions conducted through short messaging services to discover the structure, content, and context of a media event (e.g., a live media event).

In accordance with one or more embodiments, there is provided a method comprising: obtaining, with at least one computing device, a sample of short messages of a plurality of users, the sample of short messages corresponding to a media event; identifying, using the at least one computing device and the sampling of short messages, segments in the media event; and identifying, with the at least one computing device, at least one word taken from the sample of short messages, the at least one word indicating a context of the identified segment.

In accordance with one or more embodiments, there is provided a system comprising at least one computing device configured to: obtaining samples of short messages of a plurality of users, the samples of short messages corresponding to media events; identifying segments in the media event using the samples of the short message; and identifying at least one word taken from the sample of short messages, the at least one word indicating a context of the identified segment.

In accordance with one or more embodiments, a computer-readable storage medium storing computer-executable process steps is provided. These process steps include: obtaining samples of short messages of a plurality of users, the samples of short messages corresponding to media events; identifying segments in the media event using the samples of the short message; and identifying at least one word taken from the sample of short messages, the at least one word indicating a context of the identified segment.

In accordance with one or more embodiments, a system is provided that includes one or more computing devices configured to provide functionality in accordance with such embodiments. In accordance with one or more embodiments, the functionality is embodied in steps of a method performed by at least one computing device. In accordance with one or more embodiments, program code for implementing functionality in accordance with one or more such embodiments is embodied in, by and/or on a computer-readable medium.

Drawings

The above features and objects of the present disclosure will become more apparent with reference to the following description taken in conjunction with the accompanying drawings in which like reference numerals identify like elements and in which:

fig. 1 provides an overview of a process flow in accordance with one or more embodiments of the present disclosure.

Fig. 2 provides an overview including components used in accordance with one or more embodiments of the present disclosure.

Fig. 3 illustrates a maximum follower count in minutes determined from a subset of short messages related to the barake Obama (bark Obama)2009 presidential employment ceremonies, in accordance with one or more embodiments of the present disclosure.

Fig. 4 provides an example of normalized term frequency scores over time for a term (term) identified as having a highest kurtosis score in a short message corresponding to a presidential employment ceremony according to one or more embodiments of the present disclosure.

FIG. 5 provides an example of two terms having the highest persistent interest level determined from one or more presidential employment short messages: "flubbbed" and "messed".

FIG. 6 illustrates some components that may be used in connection with one or more embodiments of the present disclosure.

FIG. 7 is a detailed block diagram illustrating the internal architecture of a computing device, such as a computing device such as server 702 or user computer 704, according to one or more embodiments of the present disclosure.

Detailed Description

In general, the present disclosure includes systems, methods, and architectures for media event segment identification and annotation using short message samples.

Certain embodiments of the present disclosure will now be discussed with reference to the above-described figures (in which like numerals refer to like components). Although it is the link that uses Twitter^TMThe generated short message is used to describe embodiments of the present disclosure, but it should be clear that any other type of short messaging or micro blogging system, application and/or short message type is applicable. By way of non-limiting example, a short message is a brief, e.g., 140 characters of text and/or media content, sent from a user (e.g., a person or entity) to one or more other users. Using Twitter^TMThe user issuing a short message, the short messageThe information is displayed on the user's profile page and delivered to other users or followers (followers) who subscribe to the user's short message. Other short messaging applications include, but are not limited to, short messaging service applications, text messaging applications, multimedia messaging applications, internet chat applications, blogging and/or micro-blogging applications, email, and the like.

In accordance with one or more embodiments, a collection of short messages is sampled and the sampled messages may be used to identify one or more portions or segments of a media event and/or provide annotations or descriptions regarding the media event or segments of the media event. By way of non-limiting example, the media event is a live media event and the short message collection includes short messages collected during the live media event. A collection of short messages is sampled and the sampling of short messages is used to segment and annotate the media event. As some non-limiting examples, short message activity, such as short message activity on Twitter, is analyzed to discover and annotate one or more portions or segments, such as points of interest, and topics associated with the one or more portions or segments of the media event may be identified from the analyzed short message content. As another non-limiting example, the live media event may be stored, for example, as analog, digital, video, audio, and/or multimedia data or content, and the results of the analysis of the short message activity may annotate the media event or a portion of the media event identified from the analysis.

Fig. 1 provides an overview of a process flow in accordance with one or more embodiments of the present disclosure. At step 102, a short message is sampled, selected or identified from a set of short messages using at least one criterion. In accordance with one or more embodiments, a collection of short messages includes short message activity collected for a media event, such as a live media event. As described herein, short messages may be collected during the broadcast of a media event. As some non-limiting examples, according to one or more embodiments, short messages from users identified as having at least a threshold audience level may be selected, and/or short messages identified as conversational-type messages may be selected. At step 104, a sample of short messages, such as short messages selected from a collection of short messages using one or more criteria, is analyzed to identify certain transitions, such as new segments, points of interest, etc., during a media event. At step 106, the short message sample is analyzed to identify subject matter content to be associated with the media event, or segments or points of interest of the media event.

In accordance with one or more embodiments, the process described in FIG. 1, for example, is embodied in hardware, software, or a combination of hardware and software. In accordance with one or more embodiments, one or more general purpose computers, such as a personal computer or server computer, may be configured to perform one or more of the processes described herein.

Fig. 2 provides an overview including components used in accordance with one or more embodiments of the present disclosure. A set 202 of short messages is input into a short message sampling component 204. The set of short messages 202 corresponds to a media event. As a non-limiting example, the collection 202 may include short messages having a timestamp corresponding to the media event, such as a time of publication of the message. The timestamp may be during the time of the media event, for example during the broadcast of the media event. As another non-limiting example, the timestamp may be within a span that includes some time before and/or some time after the media event. As another non-limiting example, a pre-analysis may be performed on the short message to identify terms used in the short message that are related to the media event. It should be appreciated that these and other techniques may be used to identify the set 202 of short messages.

Short message sampling 204 samples the set of short messages 202 to select a sample of short messages 206. In accordance with one or more embodiments, short message sampling component 204 can select a short message from set 202 using one or more criteria to generate sample 206. As some non-limiting examples, short message sampling component 204 can identify a number of followers (followcasers), or users with subscribers. The number of followers may be determined based on a threshold number of subscribers, such that the selected followers have at least a threshold number of subscribers. The threshold used may be identified based on the users in the set 202, the determination of the number of subscribers for each user, and a statistical analysis of the number of subscribers for the users determined using the set 202. As a non-limiting example, the threshold may be determined from a distribution of the number of subscribers of the user, where the number corresponds to top 1/4, e.g., the selected user(s) have a number of subscribers at least at the top 25%. The analysis may be performed across the entire span of the collection 202 or with a window having a time span that is less than the entire span of the collection 202.

In accordance with one or more embodiments, short message sample 204 can analyze set of short messages 202 to identify conversational-type messages that are selected for sample 206. Typically, conversational messages are those of longer length and/or are directed to a particular user or users. At Twitter^TMA short message may contain usernames, which direct the short message to a username, and provide links between users, such as between the sender of the message and one or more usernames. It should be appreciated that other criteria may be used to identify conversational type messages.

The short message analyzer 208 analyzes the sample 206 of the short message to identify breakpoints in the media event, which are used to identify segments of the media event. Further, the analyzer 208 uses the words used in the sampling 206 of the short message to identify the topic and/or context of the entire media event and/or the identified segment of the media event.

In accordance with one or more embodiments, the segmentation information 210 and 212 may be used to summarize or otherwise describe a media event or segment of a media event; indexing, ranking and retrieving media events or segments for searching; cataloging media events, etc.

Referring again to FIG. 1, in accordance with one or more embodiments, a short message collection may be sampled with a determined audience level of the user sending the short message, e.g., the determined audience level of the user sending the short message included in the collection. In accordance with one or more such embodiments, users having a user audience deemed to be significant, e.g., users having a user audience deemed to be significant relative to the audience levels of other users in the collection, are identified, and the short message activity of the identified users, referred to herein as followers, is selected for inclusion in the sample of short messages used in steps 104 and 106 of FIG. 1. Embodiments of the present disclosure evaluate users or followers identified by a set of short messages to identify one or more followers based on the number of followers of the followers, and sample the set of short messages by selecting the message sent by the identified follower(s). In accordance with one or more embodiments, short message activity from a user identified as a follower is used to identify the beginning of a new segment or important event of interest in a media event. In accordance with one or more embodiments, the beginning of a new segment of a media event and/or the beginning of a significant event of interest is identified based on the activities of the user(s) identified as having a significant audience.

As a non-limiting example, Twitter^TMThe user may choose to subscribe to or "follow up" the message of the follower, e.g., the follower of the follower subscribes to or requests that a message from the follower be sent to the follower. Each user has a follower count that represents the number of users who explicitly listen to the user's feed. Initially, when a new user registers or registers, for example, the follower count for the new user is zero. The number of users subscribing to the follower is determined for each user in the set. As a non-limiting example, the number of followers for a given user may be determined by identifying the number of users that receive short messages from the given user. The number of followers a user has may describe the user's role, such as a person, organization or other entity in a short message activity, such as a reporter or news agency, a web-celebrity, a commentator, and so forth.

The follower count, or the number of users following a given user, may be part of the posted data set, or the count may be determined from the set of short messages. The follower counts determined for each user or subset of users may be used to identify an average follower count and a median follower count. By analyzing the determined follower count, alone or in combination with the analysis and/or the median follower count, one or more users having at least a threshold number of followers may be identified as having a significant audience.

In accordance with one or more embodiments, the follower count can be estimated by examining the follower count for a given time interval. By way of non-limiting example, the time interval is one minute; however, any time interval may be used. For each time interval, e.g. every minute, a total number of short messages, e.g. Twitter user's microblog (tweet), is determined. In accordance with one or more embodiments, the aggregated short message may be a subset of a set of short messages, such as a subset of short messages related to a given media event, such that a short message is identified as part of the subset based on the presence of one or more keyword identified for the media event in the short message. The sum, mean, median and maximum number of followers is determined for the user for each time interval. The maximum follower count over a time interval may be analyzed to identify a dominant follower, such as a follower deemed to announce the beginning of a segment or point of interest in a media event.

In one or more embodiments, follower counts can be analyzed in a localized window, such as a window comprising one or more time intervals, for example, when the follower counts are generally unstable. Fig. 3 illustrates a maximum follower count in minutes determined from a subset of short messages related to the grand talent of president 2009 in barake obama according to one or more embodiments of the present disclosure. In this example, the follower count is unstable as a whole; unstable follower counts can be measured in localized windows. There are 13 users in the front 1/4 section; within the 90 minute sampling window, the follower count for one user decreases by one but increases by two followers. In the top 1/4 section, 19 microblogs are from 13 users. Among these users, only two are usedThe households are the front 1/4 portion of the overall distribution (e.g., greater than 3/4, or Q)₃) The outliers in (1): one user is a famous blogger in the san francisco bay area with 49,485 followers, and the other is CNN Breaking News with 86,631 followers. Both have at least one post that delivers a quote from another source, such as the quote news director or barake obama. The number of followers a user has may describe the user's role, such as a person, organization or other entity in a short message activity, such as a reporter or news agency, a web-celebrity, a commentator, and so forth. The first 10 users counted by follower are the mainstream media companies and popular bloggers.

According to one or more embodiments, the level of conversational short message activity, e.g., the number of short messages that mention another user, may be used to segment the media event. This analysis may replace or be in addition to sampling the short message of the follower as a mechanism for identifying the segments of the media event. The variation in the level of conversation, which may be determined from the level of conversational-type short messages, may reflect the level of interest in the media event itself, which may be used to identify a breakpoint in the media event. As a non-limiting example, a conversational short message includes a message that mentions another user, such as in Twitter^TMIn (1), a conversation-type short message or microblog is identified as one including "comment" (formula reference) of another user. As another non-limiting example, a conversational type message contains an indication that the message is intended for one or more other users.

In accordance with one or more embodiments, fluctuations in conversational-type messages during the course of a media event may be used to identify breakpoints in the media event, which may be used to identify segments of the media event. In the case where it is assumed that the user posts less conversational short message content in the important points or segments of the media event and more conversational content at the end of the segment, a periodic increase in the number may be identified and used to identify a logical breakpoint of the event. The identified break points may be candidate segmentation points. In accordance with one or more embodiments, the time of talk-less message activity is mapped to the beginning of a segment during a media event, e.g., the beginning of the event, while the time of multi-talk is mapped to the end of the segment.

Since the number of messages per minute in a linear rate data feed may be nearly constant and aperiodic, looking at the total number of short messages per minute may not work. The amount of targeted conversations, such as the amount of "comments" in a microblog, may vary over time. In addition, there is a strong correlation between the number of typed characters per minute and the number of period per minute. In view of such correlation, the number of segments can be used as an index of the conversation level at a given time, so that the fluctuation in the number of segments is considered as the fluctuation in the conversation level of the short message. In accordance with one or more embodiments, conversation fluctuations are identified by counting the number of ention messages per time interval (e.g., every minute). The drop of the conversation message corresponding to the drop in the number of the sustain messages represents the beginning of a segment, such as the beginning of a media event, important points during a media event, and so on. The rise of the conversation message corresponding to the rise of the number of the sustain messages represents the end of a segment, such as the end of a media event, the end of an important point in a media event, and so on.

In accordance with one or more embodiments, words used in short message activity may be identified and used to identify the subject, context, and/or description of a media event or segment of a media event. The progression of word usage over time may reflect the content of media events near the moment of interest. According to one or more embodiments, the textual content of the short message may indicate the structure and content of the media event, and/or the relative interest levels generated at various times in the media event.

By way of non-limiting example, the chronological development of the text content of the short message may point to and semantically annotate important moments and predict topics of ongoing discussion and interest. In accordance with one or more embodiments, the frequency of words is examined over time. A scoring metric based on word frequency and inverted document frequency tf-idf is used. Word i is based on its window word frequency tf_t，iOr the number of short messages containing the word i in a given time window around time t is scored. Window word frequency tf_t，iCan be collected word frequency cf_iNormalized, the corpus word frequency is the total number of short messages that contain the word i in a collection or sample of short messages. As a non-limiting example, the normalized term frequency score for term i around time t may be expressed as:

which can be described as the percentage of all short messages containing the word i that occur within a window around time t. As one non-limiting example, the size of the sliding window is 5 minutes (2.5 minutes before or after t). The normalized term frequency score may be calculated, for example, for each minute covered by the set of short messages.

In accordance with one or more embodiments, terms associated with portions of interest in a media event, e.g., segments, moments, etc., include terms that have a high frequency near the time of the moment of interest and are relatively infrequent (e.g., have a lower frequency) at other times. Such words may be used to identify localized topics. According to one or more embodiments, to identify a moment of interest, each term i in a short message set or subset thereof is ranked according to its peakiness score (peakiness score), which is ntf for term i_t，iIs measured. Intuitively, the maximum kurtosis score for a word is 1, indicating that all occurrences of the word fall within one window. The off-peak words have a uniform normalized word frequency score across all windows, indicating a frequency of use that does not change over time. If the word i is inTime t reaches a significant spike, i.e., "peaked at maximum," which indicates a moment of interest at time t and that word i is a reflection of the content of that moment.

A single event may be associated with multiple words. As an example, "aretha," "franklin," "bow," and "sings" are four of the first six words of maximum overall kurtosis, but each reflect the same event: a performance of Aretha Franklin at president employment and a bow on her hat. Duplicate event tags may be removed by skipping the highly related words (p < 0.05) and replacing them with higher ranked words.

Fig. 4 provides an example of normalized term frequency scores over time for terms identified as having the highest kurtosis score in short messages corresponding to presidential employment ceremonies according to one or more embodiments of the present disclosure. Each of these words variously reflects the actual event in the employment ceremony. The words "aretha", "yoyo" and "Warren" reflect the occurrence of ArethaFranklin, Yo-Yo Ma and Rick Warren, respectively. The occurrence of "booing" corresponds to the occurrence of George w.bush, while the spike of "chopper" occurs when it leaves by helicopter. "remaking" is the highest ranked word in a string of words that repeats the content of the speech of obama, while "anthem" spikes up as the national song is played.

In accordance with one or more embodiments, persistent conversation words can be identified. According to one or more embodiments, the sustained level of interest in a certain portion of a media event is reflected in the temporal development of the usage of one or more words in a set of short messages, e.g., from Twitter or other short messaging or micro-blogging systems or applications. Sustained level of interest is by t_peak，iIs identified at t_peak，iIt is determined that a spike occurs in the normalized word frequency score for word i. Sustained interest word at t_peak，iWill be used less often before, and at t_peak，iAnd then used more frequently. To evaluate t of a word_peak，iFor the word is confirmedT is defined to be less than t_peak，i(before Peak) and t > t_peak，i(after spike) ntf_t，iAverage value of (a). The continuous interest score for a term is determined using the ratio of the average post-spike score to the average pre-spike score. All terms having a persistent interest score are ranked according to their respective persistent interest scores.

FIG. 5 provides an example of two terms having the highest persistent interest level determined from one or more presidential employment short messages: "flubbbed" and "messed". Both relate to the order in which Roberts, the chief officer, wrongly exchanged several words when attending the affidavit of the obama president. Both words were barely used before the affidavit incident and then peaked suddenly around the incident. However, unlike the peaking words shown in FIG. 4, the words "flubbed" and "strained" both continue to be used for a long period of time after the event. This particular event received a lot of media attention for days after the employment ceremony, which is predictable from the persistent interest identified by analyzing the short messages.

The use of ention in microblogs containing the two words "flubbed" and "messed" also evolves over time. If the microblogs containing "flubbbed" or "strained" were divided into two groups, those before the affidant time (before 12: 15) and those after the affidant time (after 12: 15), the type and level of conversation would be significantly different. The initial ones around the sworn time only notice and reflect the error. Also, the next ones in the following hours are further conversations about the incident and contain instances where people discuss the sworn and correct each other. Only 7% of the microblogs in the first group contained regime, and 47% in the second group.

FIG. 6 illustrates some components that may be used in connection with one or more embodiments of the present disclosure. In accordance with one or more embodiments of the present disclosure, one or more computing devices 602, such as one or more servers, user devices, or other computing devices 602, are configured to include the functionality described herein. For example, computing device 602 may be configured to collect short messages and/or analyze a set of short messages from a user of computer 604 in accordance with one or more embodiments of the present disclosure.

The computing device 602 may provide content, such as short messages such as micro blogs, to the user computer 604 via the network 606 using a browser or other application. The data storage 608 stores a collection and/or sample of short messages, and the server 1202 is configured to execute code and/or program code that performs methods in accordance with one or more embodiments of the present disclosure. The user computer 604 may be any computing device including, but not limited to, a personal computer, a Personal Digital Assistant (PDA), a wireless device, a cellular telephone, an internet appliance, a media player, a home theater system, and a media center, among others.

For the purposes of this disclosure, a computing device includes a processor for execution and memory for storing program code, data, and/or software. The computing device may have an operating system that enables software applications to be executed in order to manipulate data. Computing devices such as server 602 and user computer 604 may include one or more processors, memory, removable media readers, network interfaces, display interfaces, and one or more input devices, such as a keyboard, keypad, mouse, etc., as well as input device interfaces. Those skilled in the art will appreciate that the server 602 and/or user computer 604 may be configured in a number of different ways and/or the server 602 and/or user computer 604 may be implemented using many different combinations of hardware, software, or firmware.

According to one or more embodiments, the computing device 602 may provide a user interface to the user computer 604 via the network 606. The user interface provided to the user computer 604 may include content items, such as content of a media event, short messages, and so forth. According to one or more embodiments, the computing device 602 provides a user interface to the user computer 604 by transmitting a definition of the user interface to the user computer 604 via the network 606. The user interface definition may be specified in any of a number of languages, including but not limited to a markup language such as hypertext markup language, scripts, applets, and the like. The user interface definition may be processed by an application executing on user computer 604 (e.g., a browser application) to output a user interface on a display coupled to user computer 604 (e.g., a display connected directly or indirectly to user computer 604). According to one or more embodiments, user computer 604 uses an application, browser, short message client application.

In one embodiment, the network 606 may be the Internet, an intranet (a private version of the Internet), or any other type of network. An intranet is a computer network that allows data transfer between computing devices on the network. Such networks may include personal computers, mainframes, servers, network-enabled hard drives, and any other computing device capable of connecting to other computing devices via an intranet. The intranet uses the same internet protocol suite as the internet. The two most important elements in this suite are the Transmission Control Protocol (TCP) and the Internet Protocol (IP).

It should be appreciated that embodiments of the present disclosure may be implemented in a client-server environment, such as that shown in FIG. 6. Alternatively, embodiments of the present disclosure may be implemented in other environments (e.g., a peer-to-peer environment as one non-limiting example).

FIG. 7 is a detailed block diagram illustrating the internal architecture of a computing device, such as a computing device such as server 702 or user computer 704, according to one or more embodiments of the present disclosure. As shown in FIG. 7, the internal architecture 700 includes one or more processing units, processors or processing cores (also referred to herein as CPUs) 712 that interface with at least one computer bus 702. Also interfaced with computer bus 702 are one or more computer-readable media 706, a network interface 714, memory 704 (e.g., Random Access Memory (RAM), run-time transient memory, read-only memory (ROM), etc.), a media disk drive interface 708, which is an interface for a drive that can perform reading and/or writing to media including removable media such as floppy disks, CD-ROMs, DVDs, etc., a display interface 710, which is an interface for a monitor or other display device, a keyboard interface 716, which is an interface for a keyboard, a pointing device interface 718, which is an interface for a mouse or other pointing device, and various other interfaces not separately shown, such as a parallel and serial port interface, a Universal Serial Bus (USB) interface, and so forth.

The memory 704 interfaces with the computer bus 702 to provide information stored in the memory 704 to the CPU 712 during execution of software programs, such as an operating system, application programs, device drivers, and software modules including program code and/or computer-executable process steps, which incorporate the functions described herein, such as one or more of the process flows described herein. CPU 712 first loads computer-executable process steps from a storage device (e.g., memory 704, one or more computer-readable storage media 706, a removable media drive, and/or other storage device). CPU 712 may then execute the stored process steps in order to perform the loaded computer-executable process steps. Stored data, such as data stored by a storage device, may be accessed by CPU 712 during execution of computer-executable process steps.

Persistent storage, such as one or more media 706, may be used to store an operating system and one or more application programs. Persistent storage may also be used to store device drivers (e.g., one or more of a digital camera driver, a monitor driver, a printer driver, a scanner driver, or other device drivers), web pages, content files, playlists, and other files. The persistent storage may also include program modules and data files for implementing one or more embodiments of the present disclosure, such as listing selection module(s), targeted information collection module(s), and listing notification module(s), the functions and uses of which in the implementation of the present disclosure are discussed in detail herein.

For the purposes of this disclosure, a computer-readable medium stores computer data, which may include computer program code executable by a computer, in machine-readable form. By way of example, and not limitation, computer-readable media may comprise computer-readable storage media for tangible or fixed storage of data or communication media for transient interpretation of signals containing the code. Computer-readable storage media, as used herein, refers to physical or tangible storage devices (rather than signals) and includes, but is not limited to, volatile and non-volatile, removable and non-removable media implemented in any method or technology for the tangible storage of information such as computer-readable instructions, data structures, program modules or other data. Computer-readable storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other physical or physical medium which can be used to tangibly store the desired information or data or instructions and which can be accessed by a computer or processor.

Those skilled in the art will recognize that the methods and systems of the present disclosure may be implemented in many ways and thus are not limited by the foregoing exemplary embodiments and examples. In other words, functional elements being performed by a single or multiple components in various combinations of hardware and software or firmware, or individual functions, may be distributed among software applications at either the client or server or both. In view of this, any number of the features of the different embodiments described herein may be combined into a single or multiple embodiments, and alternative embodiments having fewer than or more than all of the features described herein are possible. Functionality may also be distributed, in whole or in part, among multiple components, in manners now known or later known. Thus, many software/hardware/firmware combinations are possible in implementing the functions, features, interfaces and preferences described herein. Additionally, the scope of the present disclosure covers conventionally known manners of carrying out the described features and functions and interfaces, as well as those variations and modifications that may be made to the hardware or software or firmware components described herein as would be understood by those skilled in the art now and hereafter.

While the systems, methods, and architectures have been described in terms of one or more embodiments, it is to be understood that the disclosure need not be limited to the disclosed embodiments. It is intended to cover various modifications and similar arrangements included within the spirit and scope of the claims, the scope of which should be accorded the broadest interpretation so as to encompass all such modifications and similar structures. The present disclosure includes any and all embodiments of the following claims.

Claims

1. A method for media event segment identification using short message samples, comprising:

obtaining, with at least one computing device, a sample of short messages having content published by a plurality of users by sampling a set of short messages having content published during a broadcast of media event content;

identifying, using the at least one computing device and the sample of short messages, segments in the media event content, including identifying, using the sample of short messages, a beginning and an end of the identified segments in the media event content using the detected level of short message activity; and

identifying, with the at least one computing device, at least one term derived from the sampled content of the short message, the at least one term indicating a context of the identified segment in the media event content.

2. The method of claim 1, further comprising:

selecting, with the at least one computing device, a sample of the short messages from a set of short messages, the selecting comprising selecting a short message from at least one user of the plurality of users, the at least one user being a follower having at least a threshold number of subscribers.

3. The method of claim 2, the step of identifying segments in the media event further comprising:

identifying, with the at least one computing device, a segment in the media event with a level of short message activity related to at least one user identified as a follower with at least a threshold number of subscribers.

4. The method of claim 1, further comprising:

selecting, with the at least one computing device, a sample of the short messages from a set of short messages, the selecting comprising selecting a conversational-type short message.

5. The method of claim 4, wherein the conversational-type message contains an indication that the message is intended for one or more users.

6. The method of claim 5, wherein the indication comprises an indicator linking a message sender and the one or more users.

7. The method of claim 1, the step of identifying segments in the media event further comprising:

identifying, with the at least one computing device, a segment in the media event with the short message identified as a conversation-type message.

8. The method of claim 1, the step of identifying segments in the media event further comprising:

determining, using the at least one computing device and the sample of short messages, a plurality of term frequency scores for a term used in the sample of short messages, each of the plurality of term frequency scores corresponding to a time window of the media event and indicating a number of short messages that contain the term in the corresponding time window;

determining, using the at least one computing device and the sample of short messages, a plurality of normalized frequency scores corresponding to the plurality of word frequency scores, the respective normalized frequency score comprising, for each word frequency score, a ratio of the word frequency score to a corpus word frequency indicating a number of short messages in the sample that contain the word;

determining, using the at least one computing device and a plurality of normalized term frequency scores identified for the term, a maximum normalized term frequency score; and

identifying, with the at least one computing device, the segment from a time window corresponding to the maximum normalized frequency score determined for the word.

9. The method of claim 1, the step of identifying at least one term taken from the sample of short messages further comprising:

determining, using the at least one computing device and the plurality of term frequency scores identified for the term, whether a frequency of use of the term is relatively high at a time corresponding to the identified segment; and

with the at least one computing device, where a frequency of use of the word is relatively high at a time corresponding to the identified segment, the word is identified as a word indicating a context of the identified segment.

10. The method of claim 9, wherein each of the term frequency scores comprises a normalized frequency score comprising a ratio of a term frequency indicating a number of short messages containing the term in the time window to a corpus of term frequencies indicating a number of short messages containing the term in the sample.

11. A system for media event segment identification using short message samples, comprising:

at least one computing device configured to:

obtaining a sample of short messages having content published by a plurality of users by sampling a set of short messages having content published during a broadcast of media event content;

identifying segments in the media event content using the samples of short messages, including identifying a beginning and an end of the identified segments in the media event content using the detected level of short message activity using the samples of short messages; and

identifying at least one word derived from the sampled content of the short message, the at least one word indicating a context of the identified segment in the media event content.

12. The system of claim 11, the at least one computing device further configured to:

selecting a sample of the short messages from a set of short messages, the selecting comprising selecting a short message from at least one user of the plurality of users, the at least one user being a follower having at least a threshold number of subscribers.

13. The system of claim 12, the at least one computing device configured to identify segments in the media event further configured to:

identifying a segment in the media event using a level of short message activity related to at least one user identified as a follower having at least a threshold number of subscribers.

14. The system of claim 11, the at least one computing device further configured to:

selecting a sample of the short messages from a set of short messages, the selecting comprising selecting a conversational type of short message.

15. The system of claim 14, wherein the conversational-type message contains an indication that the message is intended for one or more users.

16. The system of claim 15, wherein the indication comprises an indicator linking a message sender and the one or more users.

17. The system of claim 11, the at least one computing device configured to identify segments in the media event further configured to:

identifying a segment in the media event using the short message identified as a conversation-type message.

18. The system of claim 11, the at least one computing device configured to identify segments in the media event further configured to:

determining, using the sample of short messages, a plurality of term frequency scores for a term used in the sample of short messages, each of the plurality of term frequency scores corresponding to a time window of the media event and indicating a number of short messages containing the term in the corresponding time window;

determining, using the sample of short messages, a plurality of normalized frequency scores corresponding to the plurality of term frequency scores, the respective normalized frequency score comprising, for each term frequency score, a ratio of the term frequency score to a corpus of term frequencies, the corpus of term frequencies indicating a number of short messages in the sample that contain the term;

determining a maximum normalized term frequency score using a plurality of normalized term frequency scores identified for the term; and

the segment is identified from the time window corresponding to the maximum normalized frequency score determined for that word.

19. The system of claim 11, the at least one computing device configured to identify at least one term taken from the sample of short messages further configured to:

determining whether a frequency of use of the term is relatively high at a time corresponding to the identified segment using the plurality of term frequency scores identified for the term; and

in a case where the usage frequency of the word is relatively high at the time corresponding to the identified segment, the word is identified as a word indicating the context of the identified segment.

20. The system of claim 19, wherein each of the term frequency scores comprises a normalized frequency score comprising a ratio of a term frequency indicating a number of short messages containing the term in the time window to a corpus of term frequencies indicating a number of short messages containing the term in the sample.

21. An apparatus for media event segment identification using short message samples, comprising:

means for obtaining a sample of short messages having content published by a plurality of users by sampling a set of short messages having content published during a broadcast of media event content;

means for identifying segments in the media event content using the samples of short messages, including means for identifying a beginning and an end of the identified segments in the media event content using the detected level of short message activity using the samples of short messages; and

means for identifying at least one word derived from the sampled content of the short message, the at least one word indicating a context of the identified segment in the media event content.

22. The apparatus of claim 21, further comprising:

means for selecting a sample of the short messages from a set of short messages, the selecting comprising selecting a short message from at least one user of the plurality of users, the at least one user being a follower having at least a threshold number of subscribers.

23. The apparatus of claim 22, means for identifying segments in the media event further comprising:

means for identifying a segment in the media event utilizing a level of short message activity related to at least one user identified as a follower having at least a threshold number of subscribers.

24. The apparatus of claim 21, further comprising:

means for selecting a sample of the short messages from a set of short messages, the selecting comprising selecting a conversational type of short message.

25. The apparatus of claim 24, wherein the conversational-type message contains an indication that the message is intended for one or more users.

26. The apparatus of claim 25, wherein the indication comprises an indicator linking a message sender and the one or more users.

27. The apparatus of claim 21, means for identifying segments in the media event further comprising:

means for identifying a segment in the media event using the short message identified as a conversation-type message.

28. The apparatus of claim 21, means for identifying segments in the media event further comprising:

means for determining, using the sample of short messages, a plurality of word frequency scores for a word used in the sample of short messages, each of the plurality of word frequency scores corresponding to a time window of the media event and indicating a number of short messages that contain the word in the corresponding time window;

means for determining, using a sample of the short messages, a plurality of normalized frequency scores corresponding to the plurality of word frequency scores, the respective normalized frequency score comprising, for each word frequency score, a ratio of the word frequency score to a corpus word frequency indicating a number of short messages in the sample that contain the word;

means for determining a maximum normalized term frequency score using a plurality of normalized term frequency scores identified for the term; and

means for identifying the segment from a time window corresponding to the maximum normalized frequency score determined for the word.

29. The apparatus of claim 21, means for identifying at least one term taken from a sample of the short message further comprising:

means for determining whether the usage frequency of the term is relatively high at times corresponding to the identified segments using the plurality of term frequency scores identified for the term; and

means for identifying the word as a word indicating a context of the identified segment if a frequency of use of the word is relatively high at a time corresponding to the identified segment.

30. The apparatus of claim 29 wherein each of the term frequency scores comprises a normalized frequency score comprising a ratio of a term frequency indicating a number of short messages containing the term in the time window to a corpus of term frequencies indicating a number of short messages containing the term in the sample.