Google Chrome extension for TFL data

18/05/2010

I have been coding in java for about a week now (when I’m not coding in Python of course) and I fancied a break away from it. I’m not entirely sure why but I decided to take a look at coding chrome extensions. These are in fact very, very easy to make. The tube status thing below took me just over an hour to get how I wanted. If you’re running chrome you can get the installer here. I think I’ll do some more work on it to add customization – that was my initial motivation to do this, the first one I installed from the current extension gallery just showed the little panel of status’ from the site and I wanted to just pick a few lines & tweak the output.

In the end it’s easier to just code my own. It makes use of the excellent REST api provided by Ben Dodson. Creating such an extension is trivial. You can make your life a bit easier if you just include jquery from the google ajax api’s. Then its really just a collection of a few html files (in this case just the one popup.html). All my HTML is doing is making an AJAX request for the JSON data from Ben’s api. That comes back and it simply renders out a few div’s with the colors set and based on the status a unicode entity to add some fun to it.

The method of making extensions on chrome really is a breeze and the documentation is plentiful, so hats off to google yet again for making me happy to be an open source, linux loving  developer. The extension is now in the chrome gallery and available here.

No Comments

Google’s results plotted for repeated character strings

14/02/2010

Don’t ask why but out of interest I googled for the string “AAAAAAAA” earlier and after looking at the millions of pages that came back and thinking “wtf”, I searched again only making the string much longer. I was expecting it to just keep going down but at around 20 characters there was a significant jump in returned results. I scratched my beard and proclaimed this interesting (as you can probably tell I have no distractions on valentines day). To skip over further bullshit, this is the graph of 3,328 searches – that is, the number of results for every character (A-Z) repeated one to 128 times. Some of the peaks are interesting.

Why 128 and not something higher? Google wont let you, at least via the query string. The raw data for this graph was generated by a simple python script. If you aren’t coding in python already, please do look into it, its jolly nice.

#!/usr/bin/python
 
from urllib import FancyURLopener
from BeautifulSoup import BeautifulSoup
import csv
import string 
 
BASE = "http://www.google.com/search?q="
 
class MozOpener(FancyURLopener):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1;' +
' it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
ff = MozOpener()
 
out = csv.writer(open('out.csv', 'w'))
 
headers = ['count',]
for l in string.ascii_uppercase:
    headers.append(l)
out.writerow(headers)
 
for x in range(1, 128): # 128 is the max length of chars allowed
    results = [x,]
    for l in string.ascii_uppercase:
        qry = l*x
        raw_data = ff.open(BASE+qry).read()
        soup = BeautifulSoup(raw_data)
        result = soup.find(id='resultStats').findAll('b')[2]
        result = int(result.contents[0].replace(',', ''))
        results.append(result)
        print [result, qry]
    out.writerow(results)

When that had finished I simply loaded it up into OpenOffice calc (3.2 is out by the way) and plotted it with the result count set to a logarithmic scale so result #1 doesn’t just skew the thing into one boring L shape. The first thing I noticed was a very visible spiking at length 100.

This isn’t so hard to imagine, 100 is a “nice” number. It’s not hard to imagine someone using 100x a character as a test input or just a “long” string. Every character string of length 100 exhibits this spike. The much bigger blue curve is for the letter x. This is used by children and adults alike to mark kisses, and everyone knows the more x’s the more someone loves you. If you look at many of the results for 100 or 101 character X searches it seems to be when people are using it in this context. Could it be that the much bigger spike for the 101 character string X is simply because its 100 + 1 kisses?? Towards the start of the graph there are a number of interesting spikes, I’ve marked some of them along with the length.

Some of these spikes are easy to explain, the biggest number of results returned for a single repeated character phrase is a by product of DNS, yep, its “WWW”. This accounts for the slightly higher result count than the simple “A” with 17,090,000,000 pages returned versus 12,260,000,000. Another easy one is 6 F’s – the hex code for white. I am totally stumped by F’s latter behaviour though, there is a spike at F-31 and F-33 but not F-32. There is a big jump for X-12 and X-34. International mobiles have 12 digits, as do UPC codes but I feel like I’m clutching at straws :) trying to explain that. Down the other end of the graph, between 115 and 128 the characters P, H, A, M and O all have significant spikes for specific counts. For M & O when you browse many of the first few pages of google’s results many of the pages are using them as part of exaggerated speech. It’s almost like the collective conciousness of the world has decided that 120 characters is just right to describe a particularly tasty dinner.

A spreadsheet with the data I gathered is available via google docs if you’d like to investigate yourself. I should note that the python script above gets its data from .com, if you are using the site to look up some searches it will more than likely switch to your local domain. Depending on your cookies google may also perform extra filtering (e.g. safe searches) so you’re numbers may not be the exact same as mine. I’d be interested in any theories as to some of the more prominent spikes. The OpenOffice spreadsheet with the charts done is also available for download here.

If you are going to play around then save your fingers & sanity and use python to create you’re test strings, just drop into a python shell and use “X”*120 etc or perl -e ‘print “X”x100′ from the command line etc.

4 Comments