Some folks have been asking me about alternatives to Feedburner and I'm not really aware of anything... But, I do wonder - how hard would it be to develop a hosted solution to compete with them - i.e. something you can install on your own site to track feed subscriptions and access over time?Hard. Not so hard it's impossible, but hard enough that you'll need more than just one developer and 3 months free on the calendar. Tracking RSS subscribers takes a lot of time, work and technology. It's really a lot harder than you think.
To develop a hosted solution to compete with them would cost a pretty penny. You'd be serving as the RSS provider for thousands of websites with requests coming all day every day from everywhere on the planet. And on top of that you have to serve web versions of the RSS, have a rather robust admin/backend system and develop an API for pinging and other external requests.
That doesn't sound too hard. It just looks like a complicated web app with abnormally large hardware and bandwidth requirements. Couldn't you basically just throw hardware and developers at the problem and you'll be able to pull it off? Not really.
One big concern that Rand had at the time was that...
"Feedburner doesn't actually track subscribers... it tracks the number of people who accessed particular posts on particular days." We'd seen evidence of this due to the fact that our subscriber numbers were fluctuating a few thousand a day. One day we'd have 10,000 subscribers, and then the next day we'd have 6,000.Feedburner has this to say on the subject...
"FeedBurner's subscriber count is based on an approximation of how many times your feed has been requested in a 24-hour period. Subscribers is inferred from an analysis of the many different feed readers and aggregators that retrieve this feed daily. Subscribers is not computed for browsers and bots that access your feed.If you look at their blog post on serving as the RSS provider for TechCrunch, they add this...
FeedBurner calculates subscribers by matching IP address and feed reader combinations, and then using our detailed understanding of the polling behavior of a multitude of readers, aggregators, browsers and bots on the market to make additional inferences.I told Rand this, but he didn't think it was very accurate way of tracking the actual subscribers...
"I would track how many unique users have actually grabbed the feed at least once in the last month to track subscribers, then have separate stats for access. Feedburner's numbers are just not right, IMO."Not that bad of an idea. It sounds like a rather simple solution to RSS feed tracking - short, simple and to the point. Oh, I wish it was that easy. Getting accurate numbers of RSS subscribers is hard. Very hard, actually.
In the relatively standardized realm of web browsers, 99% of your users are using are IE, FireFox, Opera, Safari or some derivative of one of the four. If a browser wants a web page, it sends a rigidly formatted request to a web server with along with a very useful set of data on the user's browser, computer, OS, IP address and a bunch of other factors. The server then responds with a standardized response to the user, whose browser interprets it and displays it for you to see. It might not sound simple, but compared to the world of RSS, it really is.
Tracking "uniques" in RSS is a lot different and a lot more complicated than tracking uniques for just web pages. Remember how in for HTML we have 4 main browsers? Well in RSS there are literally thousands of RSS readers all accessing/requesting data from the feed in their own cute unique way.
Some feed readers like NewsGator make one request to the server per hour per user, while MyYahoo, NetVibes and BlogLines may make 3 requests a day total - but then serve its hundreds or thousands of users reading that feed the cached local version it collected. Some services (like 5% - 10%, BlogLines is an example) send the total number of subscribers in its requests. But most aren't that nice.
What about trying to track conversions? Well, trying to track conversions without a solid subscriber base is pretty much impossible. You could track hits to the feed and where they came from, but that wouldn't be very useful to people. You could make a piece of installable software a user would put on their own web server, which would be better from a hardware end, but you're still faced with developing algorithms for the dizzying array of RSS feed readers in order to generate accurate numbers.
Rand was now impressed by the knowledge of his amazing web developer, but also how hard of a task this was ending up being. He did have another idea, though, to track conversions...
"What if, and I'm just throwing this out there, you were to have the users embed something in their posts. Then you'd know, almost like analytics, how many people were pulling those posts, right?"In theory, yes. But even that won't work.
You can't have (java)scripts in RSS. You can have images, which could be a theoretical way to measure impressions, but some feed readers strip out all HTML so they wouldn't record the impression. And still, some readers do cache images on their own server or the users computer.
Plus, some people may just mark your entire feed (or just individual posts) as "read" and not view the post, thus not rendering the image. But they're still are a subscriber, they just didn't read your post. So do you count them that day as a subscriber then? And even if you did do the image thing, what about all the people who don't know how to or can't change their blog templates? How do they do it?
Like I said, it's complicated.
A few days later, Rand told me this nice little bit of info...
"Heard from someone here [...] that Feedburner actually has relationships with the folks at Bloglines/Netvibes/etc so they can get more accurate data from those popular places that cache the feed. I also heard that it's getting more accurate due to the Google acquisition and Google themselves providing more solid data for folks who use Google reader/IG/etc. "That's nice of them, and is one of the many hoops you have to jump through in order to accurately track subscribers. You really need a big, talented and well funded company to even begin to track RSS feeds for customers. It's just one of those barriers to entry you can't get around.
I think it is not that difficult to improve on FeedBurner's stats. They have already done the hardest part, and it is public knowledge how they do it.
If I were to create a startup to compete with them, this is what I would do to address the problems you mention.
Scalable architecture. If there is a solid business model, I would implement my service on Amazon's EC2, so that I can pay only for use and scale as demand increases. Another alternative is to create a Wordpress plugin that each blogger will install on his or her server to track his or her respective subscribers. The plugin would be licensed in some way that makes it profitable.
Accurate Tracking. The problem with FeedBurner is they are counting RSS accesses per day, and not unique accesses in a specific period (30 days for example). In order to improve on their tracking, I would do something like Rand suggests:
Here is why I think is possible.
Let's attack the problem by dividing the subscribers in two: the ones using PC/browser based feed readers and the ones using web readers and feed aggregators. The first group we can track by IPs and the second we can track by the aggregator's reported numbers or by monitoring the polling behavior on the rare cases.
For the ones using PC based feed readers, a simple IP+feed reader combination ID is enough to track unique accesses during a month period. This is the way FeedBurner does it, but we want to track a full month, instead of 24 hours. If I want to be more accurate, I can use an IP to location database to do more detailed matching of users. For example, the user my be on a dynamic IP, but if it is the same feed reader user agent, same ISP and we know the ISP uses dynamic IPs, it is very likely to be the same user.
For feed aggregators it is important to remember that Google, Yahoo and Bloglines command the lion's share of the market, and fortunately they report the number of subscribers in the USER_AGENT HTTP header, like this:
72.14.199.66 - - [30/Jul/2007:15:06:47 -0400] "GET /feed/ HTTP/1.1" 304 - "-" "Feedfetcher-Google; (+https://www.google.com/feedfetcher.html; xx subscribers; feed-id=xxxxxxxxxxxxxxxxxxx)" "-"
209.131.41.49 - - [30/Jul/2007:13:08:43 -0400] "GET /feed/ HTTP/1.0" 200 63041 "-" "YahooFeedSeeker/2.0 (compatible; Mozilla 4.0; MSIE 5.5; https://publisher.yahoo.com/rssguide; users xx; views xxx)) " "-"
This is what they say these numbers mean:
From https://publisher.yahoo.com/rss_guide/faq.php
From https://www.google.com/help/reader/publishers.html
For the minority of feed aggregator that are not as polite, we can study their polling patterns and deduce the number of unique users. Again, the key is to aggregate the stats for a 30-day period.
Actually, they aren't tracking RSS access per day. They are tracking subscribers. This is more-or-less a calculated fake number that FeedBurner creates based on all the data at hand.
I still see what you're saying. I'm not saying someone else can't do it - cause they can - it's just I don't think people realize how much work it really is. All the data is there, crunching it and doing it accuratly takes a team of people.
Fluxx - As you correctly quoted in your post, they are basing their subscriber numbers on the number of RSS accesses per day.
Again, I don't see it as difficult as you do. This is actually the kind of problem I like to solve. I think I should be able to do an initial prototype by parsing web server log files in about 2-3 weeks without using my team. A 30-day log of SEOmoz would be ideal. The problem is the feed is on FeedBurner's domain and the logging is taking place at their servers.
If you subscribe to FeedBurner's MyBrand you get feeds.seomoz.org instead of feeds.feedburner.com and that might help, but you would still need to make all existing subscribers switch. I am not sure how realistic is that.
The count is based on the number of times your feed has been accessed in a 24 hour period, but they also use a lot of other data to infer and masssage that access number in to an accurate count of feed readers. They track feed hits per day, yes, and you can see that number in your account. But the number over there on the left in the chicklet is the number of subscribers, which isn't just hits.
The key component here isn't getting at the data, it's what you do with it when you do get it. There are literally thousands of feed readers out there, and you need to be able to handle and account for all their intricicies.
And even still, if you wanted to compete with them and offer a similar service, you'd have to do this for every RSS feed for every customer.
This sounds like a cool problem Hamlet. I've been playing with EC2 lately and I'm looking for a challenge. If you're interested, I'd like to hear more about your idea of parsing web server logs.
Nick - I agree. This is an interesting problem. I noticed one of the A-list bloggers still has the feed on his domain. I sent him an email to see if he would donate a 30-day slice of his log for the experiment.
I accept your offer to help.
I think if I was going to write RSS tracking (which I have not yet done - so these are just initial thoughts). I would consider creating a unique identifying URL for each visitor to access the feed.
That part is simple enough to set up and you can see how many unique URLs have been subscribed to, you can also see how often those URLs are accessed.
Best case senario is every subscriber has a unique URL, worst case is that you get people sharing the URLs, but from there you can do all of the usual anaylysis to get a better idea of how many people are using that URL.
Over time if you have a members section you are likely to find out which members are subscribed to which URL. From that information, if you have many people on the same URL you can see which members are more likely to have stronger viral networks.
There will of course be weaknesses to work around which I am likely to find with a little more thought, but thats my initial brain dump solution.
Who knew?
Thumbs up for making me laugh, TannerC!
I too find it a little strange that Feedburner subscriber numbers change from day to day...I like the idea of averaging a month of unique users.
Great post, Hamlet. I must give you a thumbs-up even though you're right ahead of me in the rankings!
I just started using Feedburner for my blog, so I'm new to how it works. Now I understand it a little better.
i think its almost time that this is now very possible and quite valuable for business to start tracking. Social media is going to be huge in 2009, with a large number of multinational companies coming to the party with bundles of cash to track rss feeds and blogs.
I've wondered myself how hard this might be and it does seem like something beyond the lone developer. Though it looks like Hamlet may be up for the challenge.
It took me awhile to understand that FeedBurner's subscriber count was an approximation when I saw the counts vary so widely from daya to day.
I still have trouble sometimes understand exactly what is being measured when I check on the stats. Just when I think I have a handle on it I see something else that confuses me all over again.
And now for some feed music to accompany your feed building.
I have spent some time thinking about this as well and you're right that it's a lot harder than you might initially think. Shame really :(