del.icio.us stats

Statistic data on number of posts per day. A post is a bookmark a user adds to his collection.

Overall

overall number of posts per day

Monthly

Posts per day with their corresponding 95% confidence intervals.

Weekdays

weekday distribution of posts

Hours

Notice that the times in the graph are GMT+0 (UTC) times. (find out which timezone you're in)

hour distribution of posts

Statistic data on number of tags per post. The y-axis depicts how many posts exist with a certain number of tags.

Overall

tag distribution

Monthly

raw xml export

Do you want to draw your own graphs and/or combine the data with data from other services? Just grab the proprocessed xml data from either the daily posts or hourly posts and you should have everything you need to provide decent graphs. Notice that some days are left out due that the script not always run or the datas were corrupt or something else went wrong.

Export format explained

Sorry, this is no RDF or microformat or whatever, just a quick and dirty xml export. The export formats/urls may change though. If you have suggestions, please drop me an email.

Daily posts

Example:
          <stat date="2005-08-01" estimated_posts="34454" std_deviation="10034" 
                   tolerance_upper="50960" tolerance_lower="17948" recorded_posts="4290" 
                   tag_distribution="1681 939 649 382 234 173 94 54 21 17 8 5 5 0 2 4 10 1 1 8 0 0 0 0 0 1 0 0 0 0 1 0 (...)" />
        

Attributes:

  • date the date in Y-m-d format
  • estimated_posts estimated posts for the given day
  • std_deviation, tolerance_upper and tolerance_lower as explained in the about section
  • recorded_posts the number of items I actually grabbed from the rss feed (always smaller as estimated_posts)
  • tag_distribution a list of numbers separated by spaces: The first number means the number of recorded posts that have 1 tag, the second the number of posts with two tags, etc. This list has got 99 items

Hourly posts

Example:
          <stat date="2006-01-07" time_distribution="2641 2403 2356 2231 2218 2116 1686 1717 1648 1689 1785 1688 1882 2673 2389 3298 3630 3535 3796 3034 3095 2816 3386 3055"/>
        

Attributes:

  • date the date in Y-m-d format
  • time_distribution estimated number of posts for the date split up per hour: The first number is the number of estimated posts to del.icio.us from 00:00 to 00:59 of that day, followed by the number of estimated posts between 01:00 and 01:59, etc.. (notice: the first number actually belongs to the day before.. just in case you're doing graphs per day. Sorry for that.)

About del.icio.us stats

This stats are not the official charts. They are based on incomplete information. I therefore cannot claim the full correctness on any data. This statistics are approximations to the real values.

On the monthly post statistics you see the estimated errors of my statistics. These error intervals basically say that the «real values» lay in between the interval with a possibility of 95%. I assume that the number of posts is normal distributed. This assumption isn't true but actually I don't know of any better distribution (maybe the poisson distribution would be a better choice?) It would be normally distributed if the number of posts per hour is constant, which isn't true as people tend to go to sleep at night and the delicious users aren't distributed evenly on the world, see corresponding graph). I guess this wrong assumption leads to the relatively high error intervals. To compute the error interval I compute the standard deviation of the data of a day and then compute the confident limit by lowerLimit = mean - 1.645*stdDeviation (with the help of the Cumulative Distribution Function chart).

Consider that the days are split according to UTC+1 timezone (central europe). That means that when we have Saturday morning 6 am, East Coast is at midnight and West Coast at 9 pm. This means that when USA posts to delicious in the evening, it'll count for the next day in my statistics.

How it's done

Each 10 minutes, I grab the rss-feed of the most recent delicious- posts. The rss-feed of recent bookmark posts holds 30 entries. I compute the time between the first and the last entry by grabbing the dc:date-field.

The estimated number of bookmarks of a given day is then computed by: 3600*24*100 / meanTimeFor100Bookmarks whereas meanTimeFor100Bookmarks is in seconds, 3600 are the number of seconds per hour, each day has 24 hours.

Historical facts about delicious

Questions and answers

If you have further questions or suggestions don't hesitate do post a comment in my corresponding blog article.