HK1124674A

HK1124674A - Content syndication platform

Info

Publication number: HK1124674A
Application number: HK09102519.9A
Authority: HK
Inventors: E．J．帕瑞茨
Original assignee: 微软公司
Priority date: 2005-06-21
Filing date: 2006-06-14
Publication date: 2009-07-17

Description

Content aggregation platform

Background

RSS stands for real Simple Syndication (Really Simple Syndication), which is a type in web content Syndication format. RSS web feeds are becoming increasingly popular on the web and a variety of software applications with RSS support are also under development. These various applications may have many varying features and may guide the user to install several different RSS-enabled applications. Each RSS application will typically have its own list of subscriptions. When the list of subscriptions is small, it is fairly easy for the user to enter and manage these subscriptions across different applications. However, as the list of subscriptions grows, it becomes very difficult to manage the subscriptions in conjunction with each of these different RSS-enabled applications. Thus, the subscription lists easily become out of sync.

In addition, web feeds have many different file formats, with RSS 0.91, 0.92, 1.0, 2.0, and Atom being popular. Each RSS-enabled application must support most of these formats, and perhaps more in the future. Implementing parsers for RSS environments for some applications can be more difficult than others. Given that not all application developers are RSS experts who have experience and knowledge about each complex format, it is unlikely that all application developers will be able to implement parsers correctly. Thus, a large number of file formats can be given, and some application developers will not choose to develop applications in this space, or if they do so, will not be configured to fully utilize all features available between the different file formats.

Another aspect of RSS and web feeds is the publishing of related content. For example, the number of users who own weblogs (weblogs) is increasing. There are many publicly available services that provide free weblog services. However, publishing content to a weblog service can be quite cumbersome, as it can involve opening a browser, navigating to the weblog service, logging in, then typing in an entry and submitting it. Many application developers want to be able to publish from their specific applications rather than interrupt the user flow by having to go to a website. Further, there are many different types of protocols that may be used to communicate between a client device and a particular service. Thus, it is unlikely that an application developer will implement all of the protocols. The user experience will not be as complete as it could.

Disclosure of Invention

A content syndication platform, such as a web content syndication platform, manages, organizes and makes available for consumption content that is acquired from a source, such as the internet, an intranet, a private network, or other computing device. In some embodiments, the platform may acquire and organize web content and enable such content to be consumed by many different types of applications. These applications may or may not have to understand the particular syndication format. An Application Programming Interface (API) exposes an object model to allow applications and users to conveniently accomplish many different tasks such as creating, reading, updating, deleting feeds, and the like.

In addition, the platform can abstract specific feed formats to provide a common format that improves the availability of feed data into the platform. In addition, the platform processes and manages attachments received via the web feed in a manner that can make the attachments consumable to both syndication-aware applications and applications that are not syndication-aware.

Drawings

FIG. 1 is a high-level block diagram illustrating a system including a web content aggregation platform, according to one embodiment.

FIG. 2 is a block diagram illustrating aspects of an object model, according to one embodiment.

FIG. 3 is a block diagram illustrating a feed synchronization engine, according to one embodiment.

FIG. 4 illustrates an exemplary feed store according to one embodiment.

FIG. 5 illustrates an exemplary user profile according to one embodiment.

FIG. 6 illustrates an exemplary object according to one embodiment.

FIG. 7 illustrates an exemplary object according to one embodiment.

Detailed Description

SUMMARY

A content syndication platform, such as a web content syndication platform, is described for managing, organizing, and making available for consumption content acquired from a source, such as the internet, an intranet, a private network, or other computing device. In the context of the present invention, the platform is described in the context of an RSS platform that is designed for use in the context of RSS web feeds. It should be understood that the RSS context is only one example and is not intended to limit the application of the claimed subject matter to only RSS contexts. The following description assumes that the reader is relatively familiar with RSS. With regard to the background of RSS, there are publicly available specifications that can provide information to interested readers.

In this disclosure, certain terms will be used in the context of the described RSS embodiments. An item (item) is the basic unit of a feed. The items typically represent blog entries or new articles/abstracts with links to actual articles on the website. An attachment (closure) is similar to an email attachment, except that it has a link to the actual content. A feed (feed) is a list of items within a resource, usually only recently added. The system feed list (system feed list) is a list of feeds to which the user subscribes. Subscription refers to the act of signing a message to participate in receiving notifications of new feed items.

In various embodiments described herein, a platform may acquire and organize web content and make such content available for consumption by many different types of applications. These applications may or may not necessarily understand the particular syndication format. Thus, in an implementation example, an application that does not understand the RSS format may still acquire and consume content, such as enclosures, acquired by the platform through an RSS feed through the platform.

The platform includes an Application Programming Interface (API) that exposes an object model, allowing applications and users to conveniently accomplish many different tasks such as creating, reading, updating, deleting feeds, and the like. For example, using an API, many different types of applications can access, manage, and consume a list of feeds that includes a list of feeds.

In at least one embodiment, the platform provides a plurality of different feed parsers, each of which can parse a particular format in which a web feed may be received. The parsed format is then converted into a common format (common format) that can then be utilized by applications and users. The common format is used to extract specific concepts embodied by any one particular format to yield a more common, understandable format.

Further, the platform processes and manages the attachments received via the web feed in a manner that can make the attachments consumable to both syndication-aware applications and applications that are not syndication-aware. In at least some embodiments, the API allows for discovery of relationships between attachments and their associated feed items.

In the discussion that follows, an exemplary platform and its components are first described under the heading "web content syndication platform". Following this discussion, an example of an implementation (under the heading "implementation example") is provided and describes a set of APIs that expose an object model, enabling applications and users to interact with the platform in a meaningful and powerful manner.

Web content aggregation platform

FIG. 1 illustrates an exemplary system, generally at 100, according to one embodiment. Aspects of system 100 may be implemented in connection with any suitable hardware, software, firmware, or combination thereof. In at least one embodiment, aspects of the system are implemented as computer-readable instructions residing on some type of computer-readable medium.

In this example, the system 100 includes a collection of content aggregation platforms 102 and applications 104, each of which may be configured to utilize the platform in a different manner, as will become apparent below. In at least some embodiments, the content aggregation platform comprises a web content aggregation platform. In the discussion that follows, the platform 102 is described in the context of an RSS platform. It should be understood that this is merely an example and is not intended to limit application of the claimed subject matter to only an RSS environment. Rather, the principles of the described embodiments may be employed in other aggregation environments without departing from the spirit and scope of the claimed subject matter.

In this example, the platform 102 includes an object model 106 exposed by a set of APIs so that the application 104 can interact with the platform. The synchronization engine 108 is provided and configured to obtain and, among other things, convert web content, and in at least some embodiments, into a so-called universal format, as will be described in greater detail below.

The publishing engine 110 allows a user to publish content, such as weblogs, in a manner that abstracts, via an API, a communication protocol for communication between the user's application or computing device and a server or target software to receive the content.

Further, in at least one embodiment, the platform 102 includes a feed store 112 that stores both a list of feeds 114 and feed data 116. Further, platform 102, in at least one embodiment, utilizes file system 118 to store and maintain attachment 120. The use of a file system has advantages including enabling applications that do not necessarily understand the aggregate format to consume attachments that may be of interest. In addition, the platform 102 includes a post queue 122 that holds post data 124 to be posted to a particular web-accessible location.

As described above, the platform 102 may enable applications to access, consume, and publish web content. Thus, the collection of applications 104 can include many different types of applications. In at least some embodiments, the types of applications can include those applications that are aggregation-aware and those applications that are not aggregation-aware. By "aggregation-aware" is meant that the application is at least somewhat familiar with the aggregation format used. Thus, in the RSS context, an application that is aware of syndication is one that can be configured to process data or otherwise interact with content represented in the RSS format. This may include having the ability to parse and meaningfully interact with the RSS formatted data. Similarly, applications that are not aware of aggregation are generally not configured to understand the aggregation format. However, as will become apparent below, applications that are unaware of the syndication may still access and consume content that arrives at the platform in a syndicated format.

Considering more particularly the different types of applications that may interact with the platform, the collection 104 includes a web browser application 122, an RSS reader application 124, a digital image library application 126, a media player application 128, and a weblog service 130. In this example, the RSS reader application 124 is an aggregation-aware application, and the media player 128 need not be an aggregation-aware application. Further, the web browser application 122 may or may not be an aggregation-aware application. Of course, these applications are merely examples of different types of applications that may interact with the platform. Likewise, other types of applications, the same or different than those shown above, may also be used without departing from the spirit and scope of the claimed subject matter. By way of example and not limitation, these other types of applications can include calendar applications for event feeds, social networking and email applications for contact feeds, screen saver applications for picture feeds, CRM for document feeds, and the like.

In the following discussion, aspects of the various components of the platform 102 will be described in greater detail under their own headings.

Object model

FIG. 2 illustrates various objects of the object model 106, according to one embodiment. The object model to be described is only one example of an object model that may be used and is not intended to limit application of the claimed subject matter only to the object models described below. As mentioned above, the object model is exposed by the API, an example of which is described below.

In this particular object model, a top level object 200, referred to as feeds, is provided. Feed) object 200 has a (folder type) attribute called subscriptions (subscriptions). A subscription or folder (folder) object 202 is modeled as a folder hierarchy. Thus, in this particular example, the subscription or folder object has attributes that include subfolders (folders) 204 of type folder (folder) and feeds (feeds)206 of type feed (feed). Below the feed object 206 is an item (item) object 208 of type item (item), and below the item object 206 is an attachment (enclosures) object 210 of type object (object).

Each object of the object model has properties, methods, and events (in some cases) that can be used to manage web content received by the platform. The above object model allows for the use of a hierarchy for things such as managing a list of feeds. For example, the platform may execute using a folder structure for a set of feeds. Those skilled in the art will appreciate that this provides convenience to the developer of the application. For example, execution against a set of feeds provides the ability to refresh all "news" feeds located in the new folder.

As an example, consider the following. Suppose a user wishes to interact with or consume data associated with a feed to which they do not actually subscribe. For subscribed feeds, i.e., for those feeds that they are represented in the root level subscription folder, the sync engine 108 (FIG. 1) will pick up the feed and begin fetching data associated with the feed at the appropriate time interval. There are situations, however, where an application using the platform does not wish to be subscribed to a particular feed. Instead, the application only wants to use the functionality of the platform to access data from the feed. In this case, in this particular embodiment, subscription object 202 supports a method that allows a feed to be downloaded instead of a subscription feed. In this particular example, the application calls the method and provides it with the URL associated with the feed. The platform then uses the URL to fetch data of interest to the application. In this way, the application can obtain data associated with a feed in an ad hoc manner without even subscribing to the feed.

Considering further the object model, the item and attachment objects 208 and 210, respectively, are considered. Here, these objects reflect well how RSS builds itself. That is, each RSS feed has individual items within which enclosures can optionally appear. Thus, the structure of the object model is configured to reflect the structure of the syndication format.

From the perspective of the object model, there are mainly two different types of methods and properties for the project. The first type of method/attribute is suitable for read-only data, while the second type of method/attribute is suitable for read-write data.

As an example of the first type of method attribute, consider the following case. Each feed may have data associated with it represented in an XML structure. The data includes such things as title, author, language, etc. The object model treats this data as read-only. For example, data received by a feed and associated with various items is generally considered read-only. This prevents the application from manipulating the data. Using an XML structure to represent feed data also brings the following advantages. It is assumed that the synchronization engine does not understand the new XML elements that have been added. However, the synchronization engine can still store the element and its associated data as part of the feed item data. For those applications that understand the element, the element and its associated data can still be discovered and consumed by the application.

On the other hand, there is also data that is treated as read/write data, such as the name of a particular feed. That is, users may wish to personalize a particular feed for their particular user interface. In this case, the object model has a read/write property. For example, the user may wish to change the name of the feed from "New York Times" to "NYT". In this case, the name attribute may be readable and writable.

Feed synchronization engine

In the illustrated and described embodiment, the feed synchronization engine 108 (FIG. 1) is responsible for downloading RSS feeds from feeds. The sources may include any suitable source related to a feed, such as a website, feed publishing site, and the like. In at least one embodiment, any suitable valid URL or resource identifier may comprise the source of the feed. The synchronization engine receives feeds and processes the various feed formats, takes care of schedules, handles content and attachment downloads, and organizes archiving activities.

FIG. 3 illustrates an exemplary feed synchronization engine 108 in greater detail according to one embodiment. In this embodiment, the synchronization engine includes a feed format module 300, a feed schedule module 302, a feed content download module 304, an attachment download module 306, and an archive module 308. It should be understood that these blocks are shown as logically separate blocks for the purpose of clearly describing their particular functionality. These logically separate modules are not intended to limit the claimed subject matter to only the particular structures or architectures described in this application.

Feed Format Module-300

In the illustrated and described embodiment, feeds can be received in a number of different feed formats. By way of example and not limitation, these feed formats may include RSS 1.0, 1.1,. 9x, 2.0, atom.3, and so forth. The synchronization engine receives these feeds in various formats via a feed format module, parses the formats, and converts the formats to a standardized format called a common format. The common format is essentially a superset of all supported formats. One of the benefits of using a common format is that applications that know the format now need only know one format, the common format. Furthermore, it is much easier to manage content that is converted to a common format, as the platform only needs to care about one format rather than several. Furthermore, as other syndication formats are developed in the future, the feed format module may be adapted to handle the format while allowing applications that are completely unaware of the new format to still be able to utilize and use content that arrives at the platform via the new format.

Regarding the common format, consider the following. From a format perspective, a common format is represented by an XML schema that is common between the different formats. In different formats, some elements may have different names, be located in different positions in the hierarchy of the XML format, and so on. Accordingly, the common format is intended to represent a common structure and syntax that is derived collectively from all possible different formats. Thus, in some cases, elements from one format may be mapped into elements of a common format.

Feed schedule Module-302

Each feed may have its own schedule of when the synchronization engine 108 should check to determine if new content is available. Accordingly, the synchronization engine manages these schedules through the feed schedule module 302 to account for site and user or system requirements and limitations.

As an example, consider the following. When a feed is first downloaded, an update schedule (i.e., a schedule of when the feed is updated) may be included in the header of the feed. In this case, the feed schedule module 302 maintains the update schedule for the particular feed and checks for new content according to the update schedule. However, if schedule information is not included, the feed schedule module may check for new content using a default schedule. Any suitable default schedule may be used, such as re-downloading feed content every 24 hours. In at least some embodiments, the user can specify different default work schedules.

Further, in at least some embodiments, the feed schedule module can support a schedule referred to as a minimum schedule. The minimum schedule refers to a minimum update time that defines a time period between updates. That is, the platform does not update feeds more frequently than the minimum schedule defines. In at least some embodiments, the user can change the minimum time. In addition, the user may also initiate a manual refresh for any or all of the feeds.

In addition to supporting default and minimum schedules, in at least some embodiments, the feed schedule module can support publisher-specified schedules. As the name implies, a publisher-specified schedule is a schedule specified by a particular publisher. For example, a publisher-specified schedule can typically specify how many minutes of clients should update the feed next. This can be specified using the RSS 0.9x/2.0 "ttl" element. The sync engine should not take a new copy of the feed until these minutes have elapsed. The publisher specified schedule may also be specified at different granularity levels, such as hourly, daily, weekly.

It should be noted that each copy of the feed document may have a different publisher-specified schedule. For example, during the day, the publisher may provide a schedule of 15 minutes, while during the night, the publisher may provide a schedule of 1 hour. In this case, the synchronization engine updates its behavior each time a feed is downloaded.

Further, in at least some embodiments, the synchronization engine supports the concept of skipping hours and days via the feed schedule module 302. Specifically, RSS 0.9 and 2.0 allow the server to block certain days and hours during which the client should not make updates. In this case, the sync engine takes these settings into account (if set by the server) and does not update the feed during that time.

In addition to the default, minimum, and publisher-specified schedules, in at least some embodiments, the synchronization engine supports the concept of user-specified schedules and manual updates. More specifically, on a per feed basis, users may specify a schedule of their selections. From a platform perspective, the user-specified schedule may be as complex as the server-specified. In this case, the platform maintains, via the feed schedule module, a recent schedule of extractions from feeds, as well as a user schedule. In at least some embodiments, the user schedule always overrides the publisher's schedule. Further, at any time, the application may initiate a forced update to all feeds or individual feeds.

Regarding the problems of bandwidth and server, consider the following. According to one embodiment, designing a synchronization engine may take into account two related issues. First, synchronization should take into account the bandwidth and CPU of the user. Second, due to the widespread use of the RSS platform, the sync engine should take into account its impact on the server. Both of these issues have an effect on when and how the feeds are downloaded.

From the perspective of when to download a feed, the synchronization engine can be designed in accordance with the following considerations. The synchronization engine should be very conservative in how often it is updated when the server has no schedule and no other instructions from the user. Thus, in at least some embodiments, the default schedule is set to 24 hours. Furthermore, to protect the user's resources from the adverse effects of inefficient servers, a minimum schedule may be enforced to prevent the sync engine from updating too frequently, even if the servers are otherwise designated. Furthermore, updates at login (and at universal time intervals, e.g. every hour from start-up time) should be carefully managed. Feed updates should be delayed until a specified period of time after user login is completed and should be staggered slightly to prevent quasi-punctual large number of update hits per hour. This can be balanced against the user's desire to do all updates simultaneously. Furthermore, when the server uses the skip hour or skip day feature described above, the client should not take updates immediately after the expiration period has elapsed. Instead, the client should wait for a random time interval of up to 15 minutes before fetching the content.

To assist the synchronization engine in this regard, the feed schedule module 302 can maintain a state, such as fresh and stale, for each feed. The "fresh" status means that the feed is fresh based on the publisher's progress table. "stale" status means that the publisher's schedule indicates an update, but the synchronization engine has not completed the update. Clients interested in the latest content may request immediate updates and be notified when available. If this expectation is set, the synchronization engine can implement an arbitrary delay in updating the content, rather than compromising the user and the server strictly on a schedule.

Consider the following scenario with respect to how feeds are downloaded. In one embodiment, the sync engine may use a task scheduler to start the sync engine program at a predetermined time. After the synchronization engine completes, it updates the task schedule with the next time it should start the synchronization engine again, i.e., the next synchronization engine start time.

When the sync engine starts, it queues all "pending" feeds whose NextUpdateTime is less than or equal to currentTime (current time) and then processes them as follows. For each feed, the following characteristics are tracked: LastUpdateTime (last update time), NextUpdateTime, Interval (specified in minutes), and LastErrorInterval.

Upon the end of successfully synchronizing the feeds, the feeds' LastUpdateTime is set to the current time, and NextUpdateTime is set to LastUpdateTime plus the time interval plus a random value (1/10 of the time interval). The method comprises the following specific steps:

LastUpdateTime＝currentTime

NextUpdateTime＝currentTime+Interval+Random(Interval＊0.1)

ErrorInterval＝0

random (argument)) is defined as a positive value between 1 and its argument. For example, Random (10) returns a floating value between 0 and 10.

If the synchronization feed fails for one of the following reasons:

HTTP 4xx response code (response code);

HTTP 5xx response code (response code);

winsock/network error (network error); or (OR)

HTTP 200, but not a parsed feed format (response entity with parsing error (not recognized feed format))

Then the exponential compensation algorithm is applied as follows:

LastUpdateTime ═ unchanged >

ErrorInterval＝min(max(ErrorInterval＊2，1min)，Interval)

NextUpdateTime＝currentTime+ErrorInterval+Random(ErrorInterval＊0.1)

After completing the synchronization of all "suspended" feeds, the sync engine determines if any of the feeds exceeds its NextUpdateTime (NextUpdateTime < (currentTime)). If so, those "on hold" feeds are queued and processed as if the sync engine had just been started.

If there are outstanding "pending" feeds, the sync engine determines if there are any "upcoming" feeds whose NextUpdateTime is within 2 minutes of the current time (currentTime +2min > - "NextUpdateTime). If there are any "soon to synchronize" feeds, the sync engine process continues running and sets a timer to "wake up" at NextUpdateTime and handles "pending" feeds.

If there is no "sync soon" feed, then NextSyncEngineLannce is set to the NextUpdateTime of the feed with the earliest NextUpdateTime. The task scheduler is then set to NextSyncEngineLaunchTime and the synchronization engine process ends.

According to one embodiment, if there are several "on hold" feeds in the queue, the synchronization engine may synchronize multiple feeds in parallel. However, the number of parallel synchronizations should be limited, as should how many synchronizations are performed within a certain time period, in order to avoid tying up network bandwidth and processor utilization. According to one embodiment, the formation of the feed synchronization is provided via a token-bucket. Conceptually, the token bucket works as follows.

● add the token to the memory segment every 1/r second.

● the memory segment may hold up to b tokens; if the memory segment is full when the token arrives, it is discarded;

● when a feed needs to be synchronized, the token is removed from the memory segment and the feed is synchronized.

● if no tokens are available, the feed is left in the queue and waits until a token becomes available.

This approach allows burst feed synchronization for up to b feeds. However, over a long period of operation, the synchronization is limited to a constant speed r. In an implementation example, the synchronization engine uses the following values for b and r: b is 4 and r is 2.

Feed content download Module-304

According to one embodiment, the feed content download module 304 handles the process of downloading the feed and merging new feed items with existing feed data.

As an example of how the feed content download module is implemented, consider the following. When appropriate, the synchronization engine connects to the server via the feed content download module and downloads the appropriate content.

According to one embodiment, the platform is configured to support different protocols to download content. For example, the synchronization engine may support downloading a feed document via HTTP. In addition, the synchronization engine may support encrypted HTTP URLs (e.g., SSL, HTTPs, etc.). Likewise, the sync engine may also support compression using http gzip support and support downloads from a universal naming standard (UNC) shared feed.

In addition, the synchronization engine may support various types of authentication via the feed content download module. For example, the synchronization engine may store a username/password for each feed and may use that username/password for HTTP basic authentication to retrieve feed documents.

Regarding updating feeds, consider the following. To determine whether a feed has new content, the synchronization engine maintains the following pieces of information for each feed:

● time of last update of feed reported by last modified header to HTTP response;

● value of the Etag header in the last HTTP response; and

● the value of the feed's last pubDate (i.e., the feed-level publication date and time).

If the site supports Etag or last modification, the sync engine can use these to check for new content. The site may respond with an HTTP response code 304 to indicate that there is no new content. Otherwise, the content is downloaded. For example, if a site supports RFC 3229-for-feeds (RFC 3229 for a feed), the site may only return new content based on the client delivered Etag. Regardless of which approach is used, the client then merges the new content with the stored content.

As a more detailed description of one example implementation of how feed content can be downloaded, consider the following. To determine whether a particular site has been changed, the synchronization engine may submit a request with the following information:

● If the client has a saved Etag, it is the If-None-Match header; header A-IM with the following values: feed, gzip (for RFC 3229-for-feeds);

● If the client has a last Modified value stored, it is the If-Modified-nonce header.

If the server responds with HTTP response code 304, the content has not been changed and the process ends here. If the server responds with content (i.e., HTTP code 200 or 206), the downloaded content is merged with the local content (note: code 206 means the server supports RFC 3229-for-feeds, while the downloaded content is just new content).

If content is available and if the sync engine has pubDate stored and the downloaded feed document contains a channel level pubDate element, then the two dates are compared. If the local pubDate is the same as the downloaded pubDate, then the content has not been updated. The downloaded feed document may then be discarded.

If the sync engine processes one item at a time, the pubDate of each item is compared to the pubDate stored by the sync engine (if any) and older items are discarded. Each item is then compared to the items in the store. The comparison should use either a GUID element (if present) or a link element (if no GUID is present). If a match is found, the content of the new item replaces the content of the old item (if both have pubDate, it is used to decide which is more recent, otherwise the most recently downloaded is new). If no match is found, the new item is waiting to be processed before the stored feed content (maintaining the "most recent up" semantic). If any items are added or updated in the local feed, the feed is considered updated and the clients of the RSS platform are notified.

For the error case, consider the following. If the server responds incorrectly with code 500 or most 400, then the synchronization schedule is reset and the server retries at a later time. However, an HTTP error 410 should be taken as an indication to reset the update schedule to "no more updates".

An HTTP level redirect should follow, but no changes should be made to the client configuration (there are several unreasonable cases where a redirect is accidentally made).

If the server responds with an XML redirect, the feed should be redirected and the stored URL pointing to the feed should be automatically updated. This is the only case where the client automatically updates the feed URL.

For download feeds, the download should not interrupt the ordinary use of the machine (e.g., bandwidth or CPU) while the user is engaged in other tasks. Furthermore, in content-dependent interactive applications, the user should be able to obtain the content as quickly as possible.

Accessory download module-306

According to one embodiment, the attachment download module 306 is responsible for downloading attachment files for a feed and applying the appropriate security zones. Upon downloading the feed content, the attachment is also downloaded.

Downloading attachments can be handled in several different ways. First, the basic enclosures are considered RSS 2.0 style enclosures. For basic attachments, the synchronization engine will automatically parse the downloaded feed for attachment links via the attachment download module 306. The synchronization engine is configured to support a plurality of basic accessories. Using the attachment link, the attachment download module can then download the attachment. In at least some embodiments, the default action is not to download the base attachment for any new feed. Using APIs that expose the above-described object models, the client can make actions such as changing its behavior on a per-feed basis to, for example, always download attachments or force download of specific attachments to specific items in a specific feed.

Enhanced attachment handling may be provided by using the above-described generic format. In particular, in at least one embodiment, the common format defines additional functionality for the attachment. In particular, the common format allows for multiple representations of particular content. This includes, for example, standard definitions of preview content and default content and the ability to indicate whether an attachment should be downloaded or streamed. In addition, the common format allows for arbitrary metadata on the attachment and content representations. For any new feed, the default action is to download a "preview" version of any attachment that complies with the default size limits of, for example, each item 10 k.

Using the API, the client can make actions such as changing behavior on a per feed basis. For example, the behavior may be changed to always download a "default" version of the items in the feed or always download any particular version of the metadata elements with particular values. This may be used to provide "download this for each accessory? "logical client callback signal. Further, using the API, the client can force immediate download of any particular representation of any particular attachment to any particular item (or all items) in a particular feed.

Regarding providing security during accessory download, consider the following.

According to one embodiment, the downloaded accessory uses the Windows XP SP2 accessory to perform the service (SP2AES) function. Such functionality may provide file type and zone based security. For example, with file name and zone information (i.e., where the attachment came from), AES may indicate whether to block, allow, or hint.

For zone persistence, AES may persist zone information while saving the file, so that the user may be prompted when subsequently opened.

The following table describes the mapping of AES hazard levels/zones to actions:

level of danger	Limiting	Internet network	Intranet	Local area	Trusted server
level of danger	Limiting	Internet network	Intranet	Local area	Trusted server	Hazards, e.g. EXE	Blocking	Prompting	Allow for	Allow for	Allow for
Moderate/unknown, e.g. DOC or FOO	Prompting	Prompting	Allow for	Allow for	Allow for	Hazards, e.g. EXE	Blocking	Prompting	Allow for	Allow for	Allow for
Moderate/unknown, e.g. DOC or FOO	Prompting	Prompting	Allow for	Allow for	Allow for	Low, e.g. TXT or JPG	Allow for	Allow for	Allow for	Allow for	Allow for

In the illustrated and described embodiment, the synchronization engine will invoke a method, e.g., CheckPolicy, for each attachment it downloads. Based on the response, the synchronization engine may perform one of:

● prevent: not saved (marked as failed in the subscription source file);

● allow: storage accessory

● prompt: save, but persist the zone information. This means that if the user double-clicks on the file, they get a "run/not run" prompt.

According to one embodiment, the sync engine will first save the attachment to disk and will not download the attachment to memory. Saving to disk triggers filter-based antivirus applications and gives these applications the opportunity to quarantine attachments (if they choose).

Archive Module-308

According to one embodiment, the archive module 308 is responsible for processing old feed data. By default, a feed holds a maximum of 200 items. The archive module deletes older feed items when the feed exceeds a specified maximum. But does not delete the associated attachment.

Feed storage

According to one embodiment, the feed store 112 (FIG. 1) holds two types of information — a list of feeds 114 and feed data 116. As an example, consider fig. 4. Where the feed list 114 is embodied as a hierarchical tree structure 400 of feed lists. Feed data 116 includes data associated with a particular feed. In this example, the feed data 116 is arranged on a per-feed basis to include a collection 402 of items and attachments.

There are many different ways in which feed storage may be implemented. In this particular embodiment, the feed store comprises part of a file system. One reason for this relates to simplicity. That is, in this embodiment, the feed list is simply represented as a regular directory under which subdirectories and files may be located. The hierarchy is then reflected as a conventional file system hierarchy. Thus, each folder, such as "news" and "weblog", is essentially a regular directory in the file system with subdirectories and files.

In this particular example, there is a particular file type that represents a subscription to the feed. Consider, by way of example only, that this type of file has the following format: stg "xyz. The stg file stores all data about the feed. Thus, you have a list of feeds, such as the list embodied in the tree structure 400, and in each feed (or file) is feed data.

In the illustrated and described embodiment, the stg file is implemented using structured storage technology. Structured storage techniques are well known and understood by those skilled in the art. However, as a brief background, consider the following.

Structured storage is handled by treating a single file as a structured collection of objects, referred to as storage and streaming, to provide persistence of files and data in COM. The purpose of structured storage is to reduce the performance penalty and overhead of storing separate object parts in different files. Structured storage provides a solution to treat a single file entity as a structured collection of two types of objects (storage and stream) by defining how to treat it via a standard implementation called compound file. This allows a user to interact with and manage a compound file as if it were a single file rather than a nested hierarchy of separate objects. Those skilled in the art will appreciate that storage objects and stream objects are used as a file system in a file. Structured storage solves the performance problem by eliminating the need to completely rewrite files into storage each time a new object is added to a compound file or an existing object increases in size. The new data is written to the next available location of the persistent store and the storage object updates the pointer table it maintains to track the location of its storage objects and stream objects.

Stg files, and the API on top of the feed store allows access to different streams and stores. In this particular example, each RSS item is written to a stream. In addition, the header stream contains information associated with a particular feed, such as a title, subscription, feed URL, and the like. Furthermore, another stream stores index-type metadata, allowing for fast and efficient access to content in a file for purposes including quickly marking something as readable/unreadable, deleting items, and the like.

File system-attachment

In the illustrated and described embodiment, attachments are not stored in a structured storage or as part of feed data, as shown in FIG. 1. Rather, the attachment is identified as an item, such as one or a few pictures, that other applications and users may wish to access or manipulate.

Thus, in the illustrated and described embodiment, the attachment is written to the user's particular profile. But maintains a link between the attachment and the associated feed item.

As an example, consider fig. 5. Once a user starts to subscribe to a feed, the feed content is stored locally under the user's profile, either in the Application Data or in the known folder "feeds".

The list of feeds and feeds are stored in the Application Data to enable better control over the format of the list of feeds and the feeds. The API is exposed (as will be described below) so that the application can access and manage the feed.

The list of feeds is a set of feeds to which the user subscribes. In this example, the file that includes the list of feeds is located at:

C:\Users\<Username>\AppData\Roaming\Microsoft\RSS\

the file contains the attributes of the feed as well as the item and attachment attributes (URL pointing to the file associated with the item). For example, a file of the feed "NYT" is located at:

C:\Users\<Username>\AppData\Roaming\Microsoft\RSS\NYT.stg

in this example, attachments are grouped by a subscription source and stored in a known folder (Knownfolder) "feeds". This allows users and other applications to conveniently access and use downloaded files.

For example, users subscribe to NPR feeds and wish to ensure that their media player application can automatically add those files. Making this a known folder allows the user to browse it from the media player and set it as the monitored folder. The attachment has the appropriate metadata for the feed and the post so that the application can access the associated post and feed. The accessories are located as follows:

C:\Users\<Username>\Feeds\<Feedname>\

each attachment written to the user's hard disk will have a secondary stream (e.g., NTFS stream) containing metadata about the attachment. By way of example and not limitation, metadata may include a feed from which the attachment came, an author, a link to a feed item, a description, a title, a release date, and a download date, as well as other suitable metadata.

Issue engine/post queue

Typically when one writes a conventional weblog post, it is the RSS item that is being written in essence. This RSS item is typically sent to some type of server that maintains account information, the location of the weblog, etc. In this case, the publication engine 110 (FIG. 1) is configured to enable the application to make postings or publish content while at the same time extracting from the application a communication protocol for communicating with the server. Thus, the application need only provide the data or content to be posted, and the publishing engine will process the remaining tasks of formatting and passing the content to the appropriate server.

Since there are several different protocols that can be used, extracting the protocol from the application provides great flexibility in enabling many different types of applications to take advantage of the publishing functionality. In the illustrated and described embodiment, the publishing engine function is implemented as an API that allows applications to post weblogs without knowing the protocol used to communicate with the server.

Thus, in this example, the API has a method to create a new post, which when called creates an RSSItem (RSS item) object. The RSSItem object then has a post method that, when invoked, stores the content (in this case the weblog) in a temporary memory, i.e., a post queue 122 (fig. 1). The content is stored in temporary memory because the user may not be online when creating the blog. Next, when the user makes an online connection, the publication engine 110 connects to the appropriate server and uploads the weblog onto the server using a protocol appropriate to the server.

Implementation examples

In the following description, an exemplary set of APIs is described as merely one example of how one may implement and structure an API to achieve the above-described functionality. It should be understood that other APIs may also be used without departing from the spirit and scope of the claimed subject matter. The API is typically embodied as computer readable instructions and data that reside on some type of computer readable medium.

The APIs described below may be used to manipulate the set of feeds (system feed list) to which a user subscribes and the attributes of the feeds. In addition, the feed data APIs (i.e., items and attachments) provide access to feeds stored in the feed store and autonomous downloads of the feeds. Using the feed API, applications such as web browsers, media players, digital image library applications, and the like can then expose the feed data within their experiences.

In the example to be described, the API is implemented as a COM dual interface, such that the API is available to scripting languages, managed code, and raw Win32(C + +) code.

FIG. 6 illustrates top-level objects or interfaces Ifeeds and IFeedFolder objects or interfaces and their associated properties, methods, and events, according to one embodiment.

In this example, IFeeds has an attribute-subscriptions as IFeedFolders. It is the root folder for all subscriptions. There are a variety of methods for the root object, such as DeleteFeed (), DeleteFeed byguid (), DeleteFolder (), and the like.

Of interest in this example is the GetFeedByGuid () method. The method may be called by an application to access a particular feed through, for example, the GUID of the feed. Thus, the application need not be aware of the hierarchical ordering of feeds. Instead, the application may use the GUID of the feed to enable the platform to fetch the feed.

In addition, the ExistFeed () method checks the presence of a feed by name. While ExistFeedByGuid () checks for the presence of a feed by GUID. The GetFeed () method acquires a feed by name or by GUID. The IsSubscribed () method enables an application or caller to determine whether a particular feed is subscribed to.

In addition, the IFeeds object also has subscriptionNotification events that allow the registration system to subscribe to notifications of changes on the feed list.

As described above, the type of subscription is IFeedFolder. The IFeedfolder object or interface essentially provides a directory and has similar types of attributes, such as Name, Parent, Path, etc. Further, the IFeedFolder object has a feed attribute of type IFeed and a folders attribute of type IFeedFolder. The folders property belongs to the set of folders under the instant folder (e.g., the folder that exports the hierarchy), while the Feeds property belongs to the actual Feeds in a particular folder. In addition, IFeedFolder has a LastWriteTime attribute that indicates the time when anything was last written to the folder. This attribute is useful for applications that may not have been running for the time being, but also need to look at the feed platform and determine its state so that (if needed) it can synchronize.

There are a number of methods for IFeedFolder, some of which pertain to creating feeds (feeds that the creation system does not have and adding to a particular folder), creating subfolders, deleting folders or subfolders, and so forth.

FIG. 7 illustrates additional objects and their associated methods according to one embodiment. Specifically illustrated are IFeed, Item and Ienclosure objects.

First, starting with an IFeed object, consider the following. As understood by those skilled in the art, many of the attributes associated with the object come from the RSS feed itself, such as Title, Url, Webmaster, skippours, skippays, ManagingEditor, Homepage, ImageURL, etc. In addition, there is another set of attributes of interest, namely the Items attribute with the collection of all Items as part of the feed and the localnclusurepath attribute that provides the actual directory into which all attachments are written. Thus, for an application, the latter attribute enables the application to access the attachment with great ease.

In addition, the object supports a small set of methods, such as Delete () and Download (), for managing a particular feed. In addition, the object supports the method XML (), which returns the XML of the feed in a common format. XML data can be used for such things as creating a newspaper view of a feed. Clone () returns a copy of the unsubscribed feed.

Proceeding to the Item object, the object has a set of attributes that represent conventional RSS elements, such as Description, Url, Title, Author, etc. Furthermore, there is a Parent attribute that points back to the associated actual feed, and an Id attribute that enables the application to operate on the Id rather than having to repeat all items. In addition, there is an Enclosures attribute, which is a collection of Enclosures for items of type IEnclosure. Additionally, the IsRead attribute enables an application to indicate whether a particular item has been read.

Proceeding to the closure object, consider the following. The object has attributes including a Type attribute (e.g., mp3) and a Length attribute that describes the Length of the particular attachment. There is also a localab absolute path for a particular attachment. The Download () method allows applications to Download and use individual attachments.

Summary of the invention

The web content syndication platform described above may be used to manage, organize, and make available for consumption content acquired from the internet. The platform may acquire and organize web content and make such content available for consumption by many different types of applications. These applications may or may not have to understand the particular syndication format. An Application Programming Interface (API) exposes an object model to allow applications and users to conveniently accomplish many different tasks such as creating, reading, updating, deleting feeds, and the like. In addition, the platform can abstract specific feed formats to provide a common format that facilitates availability of feed data into the platform. In addition, the platform processes and manages attachments received via the web feed in a manner that can make the attachments consumable to both syndication-aware applications and applications that are not syndication-aware.

Although the invention has been described in language specific to structural features and/or methodological steps, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or steps described. Rather, the specific features and steps are disclosed as preferred forms of implementing the claimed invention.

Claims

1. A system, comprising:

one or more computer-readable media;

computer readable instructions to implement a content aggregation platform on the medium, the content aggregation platform comprising:

a feed synchronization engine configured to obtain content from a feed and make the content available to both syndication-aware applications and applications that are not syndication-aware; and

a feed store configured to store a list of feeds and feed data.

2. The system of claim 1, wherein the feed synchronization engine is configured to convert content in different formats to a common format.

3. The system of claim 1, wherein the feed synchronization engine is configured to support a plurality of different types of schedules, at least one class of the schedules comprising a minimum schedule that defines a minimum update time that defines a time period between updates.

4. The system of claim 1, wherein the feed synchronization engine is configured to download feed data and update previously downloaded feed data.

5. The system of claim 1, wherein the feed synchronization engine is configured to download attachments and provide the attachments to a file system, wherein the attachments are accessible by applications that are unaware of the aggregation.

6. The system of claim 5, wherein the platform is configured to write an attachment to a user profile.

7. The system of claim 6, wherein the platform is configured to maintain links between attachments and associated feed items.

8. The system of claim 1, wherein the feed store is implemented as part of a file system.

9. The system of claim 8, wherein the feed list is represented as a directory that can have subdirectories.

10. The system of claim 8, wherein the list of feeds and feed data are managed within the file system using a structured storage technique.

11. The system of claim 1, wherein the feed synchronization engine is configured to enable a user to publish content in a manner that abstracts communication protocols from between the user application and a publication location.

12. The system of claim 1, wherein the syndication platform is configured to conduct feed synchronization.

13. The system of claim 1, wherein the content syndication platform comprises an RSS platform.

14. A system, comprising:

one or more computer-readable media embodying a set of application program interfaces, comprising:

a first interface that acts as a root folder for subscriptions and has one or more methods that belong to a feed;

a second interface that acts as a folder for a feed, wherein the second interface has feed attributes and subfolder attributes and one or more methods that belong to the folder;

a third interface comprising attributes associated with the single feed and one or more methods belonging to the single feed;

a fourth interface comprising attributes associated with the single item and at least one method pertaining to the single item;

a fifth interface comprising attributes associated with the single attachment and one or more methods associated with the single attachment, wherein at least one method allows the attachment to be downloaded by the application.

15. The system of claim 14, wherein the first interface has a method of downloading feeds without requiring a subscription to the feeds.

16. The system of claim 14, wherein the first interface comprises a notification event that allows an application to register for notifications belonging to a list of system feeds.

17. The system of claim 14, wherein the second interface includes an attribute indicating a time at which data was last written to an associated folder.

18. The system of claim 14, wherein the third interface includes item attributes that are a collection of items associated with a particular feed and local attachment path attributes that provide a directory into which individual attachments are written.

19. The system of claim 14, wherein the third interface comprises a method for returning XML for the feed to the caller.

20. The system of claim 14, wherein the fourth interface comprises a parent property that points to an associated feed and an attachment property associated with an attachment to an item.