del.icio.us stats
Statistic data on number of posts per day. A post is a bookmark a user adds to his collection.
Overall
Monthly
Posts per day with their corresponding 95% confidence intervals.
Weekdays
Hours
Notice that the times in the graph are GMT+0 (UTC) times. (find out which timezone you're in)
raw xml export
Do you want to draw your own graphs and/or combine the data with data from other services? Just grab the proprocessed xml data from either the daily posts or hourly posts and you should have everything you need to provide decent graphs. Notice that some days are left out due that the script not always run or the datas were corrupt or something else went wrong.
Export format explained
Sorry, this is no RDF or microformat or whatever, just a quick and dirty xml export. The export formats/urls may change though. If you have suggestions, please drop me an email.
Daily posts
Example:
<stat date="2005-08-01" estimated_posts="34454" std_deviation="10034"
tolerance_upper="50960" tolerance_lower="17948" recorded_posts="4290"
tag_distribution="1681 939 649 382 234 173 94 54 21 17 8 5 5 0 2 4 10 1 1 8 0 0 0 0 0 1 0 0 0 0 1 0 (...)" />
Attributes:
- date the date in Y-m-d format
- estimated_posts estimated posts for the given day
- std_deviation, tolerance_upper and tolerance_lower as explained in the about section
- recorded_posts the number of items I actually grabbed from the rss feed (always smaller as estimated_posts)
- tag_distribution a list of numbers separated by spaces: The first number means the number of recorded posts that have 1 tag, the second the number of posts with two tags, etc. This list has got 99 items
Hourly posts
Example:
<stat date="2006-01-07" time_distribution="2641 2403 2356 2231 2218 2116 1686 1717 1648 1689 1785 1688 1882 2673 2389 3298 3630 3535 3796 3034 3095 2816 3386 3055"/>
Attributes:
- date the date in Y-m-d format
- time_distribution estimated number of posts for the date split up per hour: The first number is the number of estimated posts to del.icio.us from 00:00 to 00:59 of that day, followed by the number of estimated posts between 01:00 and 01:59, etc.. (notice: the first number actually belongs to the day before.. just in case you're doing graphs per day. Sorry for that.)
About del.icio.us stats
This stats are not the official charts. They are based on incomplete information. I therefore cannot claim the full correctness on any data. This statistics are approximations to the real values.
On the monthly post statistics you see the estimated errors of my statistics.
These error intervals basically say that the «real values» lay in between the interval with a possibility of 95%.
I assume that the number of posts is normal distributed. This assumption isn't true but actually I don't know of any better distribution
(maybe the poisson distribution would be a better choice?)
It would be normally distributed if the number of posts per hour is constant, which isn't true as people tend to go to sleep at
night and the delicious users aren't distributed evenly on the world, see corresponding graph). I guess this wrong assumption leads to the relatively high error intervals.
To compute the error interval I compute the standard deviation of
the data of a day and then compute the confident limit by lowerLimit = mean - 1.645*stdDeviation
(with the help of the Cumulative Distribution Function chart).
Consider that the days are split according to UTC+1 timezone (central europe). That means that when we have Saturday morning 6 am, East Coast is at midnight and West Coast at 9 pm. This means that when USA posts to delicious in the evening, it'll count for the next day in my statistics.
How it's done
Each 10 minutes, I grab the rss-feed of the most recent delicious-
posts. The rss-feed of recent bookmark posts holds 30 entries. I compute the time between
the first and the last entry by grabbing the dc:date-field.
The estimated number of bookmarks of a given day is then computed by:
3600*24*100 / meanTimeFor100Bookmarks
whereas meanTimeFor100Bookmarks is in seconds, 3600 are the number of seconds per hour, each day has 24 hours.
Historical facts about delicious
- Delicious is active since February 2002 (according to Joshuas first posts)
- The mailinglist started in December 2003. I suppose it is about the date delicious took off. (This is sound with an interview he's given on January 2005)
- 18.5.2004, Joshua: "There's about 400k posts and 200k links."
Questions and answers
If you have further questions or suggestions don't hesitate do post a comment in my corresponding blog article.