Follow-up 3rd party footprint

The following post outlines the links, tools and articles I mentioned in my 3rd party footprint talk

Main Slides

Slides (Webdirections, Melbourne) and Slides (Velocity NYC 2014) and Web Rebels in Oslo

Shared Links and Articles

Tools and Tricks

WebPagetest Results

Follow-up on my talk “Embracing Performance in Today’s Multi-Platform Macrocosm”

Hello everyone, this is a follow-up blog post on my talk presented at BDConf in San Diego and WebExpo in Prague.

If you landed here because you’ve typed in the URL after attending my talk, great! Thanks for making it all the way here. I hope you enjoyed my talk.

If you landed here via Google, Twitter or any other sites, I welcome you too, of course! You might want to first have a look at my slides (see link below) before clicking on any of the other links below.

Either way, feel free to leave a comment or contact me via twitter with my handle @bbinto.



Slides available on SlideShare

Links and articles, recommended content

Maven Tools

Continuos Integration Tools (<3)

General Links

Image Credits

The Power of a Private HTTP Archive Instance: Finding a Representative Performance Baseline

(Note: cross-posted at programming.oreilly.com)

Be honest, have you ever wanted to play Steve Souders for a day and pull some revealing stats or trends about some web sites of your choice? Or maybe dig around the HTTP archive? You can do that and more by setting up your own HTTP Archive.

httparchive.org is a fantastic tool to track, monitor, and review how the web is built. You can dig into trends around page size, page load time, content delivery network (CDN) usage, distribution of different mimetypes, and many other stats. With the integration of WebPagetest, it’s a great tool for synthetic testing as well.

You can download an HTTP Archive MySQL dump (warning: it’s quite large) and the source code from the download page and dissect a snapshot of the data yourself.  Once you’ve set up the database, you can easily query anything you want.


You need MySQL, PHP, and your own webserver running. As I mentioned above, HTTP Archive relies on WebPagetest—if you choose to run your own private instance of WebPagetest, you won’t have to request an API key. I decided to ask Patrick Meenan for an API key with limited query access. That was sufficient for me at the time. If I ever wanted to use more than 200 page loads per day, I would probably want to set up a private instance of WebPagetest.

To find more details on how to set up an HTTP Archive instance yourself and any further advice, please check out my blog post.


Going back to the scenario I described above: the real motivation is that often you don’t want to throw your website(s) in a pile of other websites (e.g. not related to your business) to compare or define trends. Our digital property at the Canadian Broadcasting Corporation’s (CBC) spans over dozens of URLs that have different purposes and audiences. For example, CBC Radio covers most of the Canadian radio landscape, CBC News offers the latest breaking news, CBC Hockey Night in Canada offers great insights on anything related to hockey, and CBC Video is the home for any video available on CBC. It’s valuable for us to not only compare cbc.ca to the top 100K Alexa sites but also to verify stats and data against our own pool of web sites.

In this case, we want to use a set of predefined URLs that we can collect HTTP Archive stats for. Hence a private instance can come in handy—we can run tests every day, or every week, or just every month to gather information about the performance of the sites we’ve selected. From there, it’s easy to not only compare trends from httparchive.org to our own instance as a performance baseline, but also have a great amount of data in our local database to run queries against and to do proper performance monitoring and investigation.

Visualizing Data

The beautiful thing about having your own instance is that you can be your own master of data visualization: you can now create more charts in addition to the ones that came out of the box with the default HTTP Archive setup. And if you don’t like Google chart tools, you may even want to check out D3.js or Highcharts instead.

The image below shows all mime types used by CBC web properties that are captured in our HTTP archive database, using D3.js bubble charts for visualization.

Mime types distribution for CBC web properties using D3.js bubble visualization. The data were taken from the requests table of our private HTTP Archive database.

Mime types distribution for CBC web properties using D3.js bubble visualization. The data were taken from the requests table of our private HTTP Archive database.

Querying the Database

Sometimes, you want to get some questions answered without creating a chart. That’s when you can query the MySQL tables directly. Let’s run a simple query on the requests table.

For example, some of the CBC sites use YUI, some use jQuery—but we would really like to avoid having pages serve both. A simple sample query like the one below could help identify those sites:

SELECT req_referer
FROM requests
WHERE url LIKE "%/jquery_.js%" OR url LIKE "%/i/l/yui/%"
GROUP BY req_referer

And More …

We will share more of the queries and insights we’ve gathered from our HTTP Archive instance that helped us identify bottlenecks. In addition, we will also discuss how this setup came in very handy to discover problems with some unnecessary page weight that we thought we didn’t have.

Join our talk at the Velocity conference in Santa Clara in June titled “The Canadian Public Broadcaster on A Diet: Slimming Down for A Whole Nation.” The talk will not only cover the private HTTP Archive instance but furthermore cover many other aspects of how to focus on (mobile) web fitness and how to “slim down.”

Related Posts to our Talk