The Power of a Private HTTP Archive Instance: Finding a Representative Performance Baseline

(Note: cross-posted at programming.oreilly.com)

Be honest, have you ever wanted to play Steve Souders for a day and pull some revealing stats or trends about some web sites of your choice? Or maybe dig around the HTTP archive? You can do that and more by setting up your own HTTP Archive.

httparchive.org is a fantastic tool to track, monitor, and review how the web is built. You can dig into trends around page size, page load time, content delivery network (CDN) usage, distribution of different mimetypes, and many other stats. With the integration of WebPagetest, it’s a great tool for synthetic testing as well.

You can download an HTTP Archive MySQL dump (warning: it’s quite large) and the source code from the download page and dissect a snapshot of the data yourself.  Once you’ve set up the database, you can easily query anything you want.

Setup

You need MySQL, PHP, and your own webserver running. As I mentioned above, HTTP Archive relies on WebPagetest—if you choose to run your own private instance of WebPagetest, you won’t have to request an API key. I decided to ask Patrick Meenan for an API key with limited query access. That was sufficient for me at the time. If I ever wanted to use more than 200 page loads per day, I would probably want to set up a private instance of WebPagetest.

To find more details on how to set up an HTTP Archive instance yourself and any further advice, please check out my blog post.

Benefits

Going back to the scenario I described above: the real motivation is that often you don’t want to throw your website(s) in a pile of other websites (e.g. not related to your business) to compare or define trends. Our digital property at the Canadian Broadcasting Corporation’s (CBC) spans over dozens of URLs that have different purposes and audiences. For example, CBC Radio covers most of the Canadian radio landscape, CBC News offers the latest breaking news, CBC Hockey Night in Canada offers great insights on anything related to hockey, and CBC Video is the home for any video available on CBC. It’s valuable for us to not only compare cbc.ca to the top 100K Alexa sites but also to verify stats and data against our own pool of web sites.

In this case, we want to use a set of predefined URLs that we can collect HTTP Archive stats for. Hence a private instance can come in handy—we can run tests every day, or every week, or just every month to gather information about the performance of the sites we’ve selected. From there, it’s easy to not only compare trends from httparchive.org to our own instance as a performance baseline, but also have a great amount of data in our local database to run queries against and to do proper performance monitoring and investigation.

Visualizing Data

The beautiful thing about having your own instance is that you can be your own master of data visualization: you can now create more charts in addition to the ones that came out of the box with the default HTTP Archive setup. And if you don’t like Google chart tools, you may even want to check out D3.js or Highcharts instead.

The image below shows all mime types used by CBC web properties that are captured in our HTTP archive database, using D3.js bubble charts for visualization.

Mime types distribution for CBC web properties using D3.js bubble visualization. The data were taken from the requests table of our private HTTP Archive database.

Mime types distribution for CBC web properties using D3.js bubble visualization. The data were taken from the requests table of our private HTTP Archive database.

Querying the Database

Sometimes, you want to get some questions answered without creating a chart. That’s when you can query the MySQL tables directly. Let’s run a simple query on the requests table.

For example, some of the CBC sites use YUI, some use jQuery—but we would really like to avoid having pages serve both. A simple sample query like the one below could help identify those sites:

SELECT req_referer
FROM requests
WHERE url LIKE "%/jquery_.js%" OR url LIKE "%/i/l/yui/%"
GROUP BY req_referer

And More …

We will share more of the queries and insights we’ve gathered from our HTTP Archive instance that helped us identify bottlenecks. In addition, we will also discuss how this setup came in very handy to discover problems with some unnecessary page weight that we thought we didn’t have.

Join our talk at the Velocity conference in Santa Clara in June titled “The Canadian Public Broadcaster on A Diet: Slimming Down for A Whole Nation.” The talk will not only cover the private HTTP Archive instance but furthermore cover many other aspects of how to focus on (mobile) web fitness and how to “slim down.”

Related Posts to our Talk

Warming up for Velocity 2013 in Santa Clara

I have attended several conferences in the last few years. The first one that really changed my “developer life” was the Velocity 2011 conference in Santa Clara. I have always been interested in optimizing and being diligent about the web, however, my learning during those three days in Santa Clara has influenced my every day life and the way I see performance.

I truly admire each and every speaker and attendee at the conference because they all share the same passion: Optimizing the web and making performance count. I am honoured to announce that it is my turn this year to give back to that same community of people and share what I have learned over the last few years and what I have been applying at Canada’s public broadcaster, the CBC.

Our talk “The Canadian Public Broadcaster On A Diet: Slimming Down For A Whole Nation“ will focus on (mobile) web fitness and how to “slim down”. Tips and tricks will be shared about how to stay in shape when developing (mobile) sites for millions of people.

My talented co-worker Blake and I will be talking about how we apply performance optimization at the CBC, one of Canada’s largest web properties with over 5 million pages. As a publicly funded organization, all Canadian eyes are on us making sure we stay on budget and deliver quality and optimized content to users.

While Blake will be talking more about the backend, server and CDN aspects of performance optimization and tips, I will be sharing information about how we optimize and tweak performance from a frontend development and automated deployment perspective, basically – how to get and stay in shape.

Don’t worry; this definitely will not be your typical boring and horrifying boot-camp experience! Our talk will utilize fun and catchy analogies to explain the weight and performance of pages. I will be your honest CBC “fitness trainer”, telling the audience about the page weight of our sites on multiple platforms, how we measure performance and set budgets. However, putting our content on a scale will tell the truth: a content breakdown of our pages will help the audience understand how content is structured and where we can “slim down”, but also where a fitness routine cannot help.

Keep us company while we share some insights about setting up our own HTTP Archive instance as a tool – or how I would describe it: the BMI of web sites – to compare our own weight to the public HTTP Archive instance. We will share some queries from our HTTP Archive database to help identify bottlenecks, and we will tell you about how we discovered problems with some unnecessary weight that we thought we didn’t have.

Additionally, sweet and dangerous temptations will be placed in front of your eyes, the kinds that we all have to deal with when creating high traffic sites, including, 3rd party scripts that could significantly harm the performance of our sites when not properly implemented. We compare client-side versus server-side 3rd party implementation. We will also reveal the amount of improvement we saw in load time once we turned off all ads on our mobile touch site for a weekend.

During our talk, you will also hear about our fitness stack regarding how we monitor our fitness level, and why it is so important to stay on a strict exercise schedule and avoid gaining too much unwanted weight, which can happen without even knowing it. If you want to exercise and stay in shape, there are tons of great tools out there to help you achieve that. We will cover how we organize and optimize our sites, our releases and deployment and how easily you can include tools in your deployment process to automate performance optimization.

If you want to know how we use RUM in combination with synthetic testing, and what our RUM numbers reveal, then you shouldn’t miss out on our talk.

Lastly, we will explain the challenges that we have faced, as the national news broadcaster in a world of ever changing news, with the potential for a breaking news story at any moment, that could drive our traffic to the roof, and how we need to respond to that.

Come join our talk and if you like, wear your favorite running shoes because you never know, you might want to start exercising right after.

We look forward to meeting you all!

More details to our scheduled talk and location: http://velocityconf.com/velocity2013/public/schedule/detail/27973

Performance check: CBC’s logo as pure CSS, Data URI and simple PNG on the scale

There is always room for improvement. Period.

Think about the 100 meter men’s sprint. I am  amazed how it continues to be possible for human beings to still become faster and improve their performance.

I’m not Usain Bolt – I can’t run 100 meters in 9.58 seconds but I might be able to run (mobile) websites under 10 seconds.

Today, I want to focus on a technique I first heard about at the Velocity Conference in 2011 in Santa Clara and how to compare it with other ways to serve images in HTML pages.

Data URI is based on base64 encoding and basically encodes data (e.g. images) into bites. It can be used to improve performance.

Data URI as “Performance Enhancer”

Instead of requesting for example a PNG image, you could encode it as base64 and serve it inline with your HTML code. That way, you reduce one HTTP request – right there – 200ms saved. Instead putting it inline, you could also put it encoded in an external stylesheet.

Watch out for caching limitations though. Data URIs can’t be cached as they don’t have a standalone cache policy, they are part of the file that includes them. So they can only piggy-bag on other cacheable assets, e.g CSS or HTML files.

As Nicholas explains, if you put data URI images inline with HTML, they can only be cached as part of the HTML cache control header. If the HTML is not cached, the entire content of the HTML markup, including the inline data URI will be re-downloaded every single time. That approach, if the image is big in size, can slow down and weight down your page. Hence, one option is to to put it in stylesheets because they can have an aggressive cache while the HTML page can have no cache or a very limited cache.

Limitations and Browser Support

Think about the browser audience of the site you want to leverage data URIs for. If you target modern browsers, including new mobile devices because that’s where you really want to focus on performance the most, you might be able to ignore the following limitations and accept the little restricted list (thanks to Fire) of supported browsers.

  • Firefox 2+
  • Opera 7.2+ – data URIs must not be longer than 4100 characters
  • Chrome (all versions)
  • Safari (all versions)
  • Internet Explorer 8+ (data URIs must be smaller than 32KB)

Motivation for Comparison

I’ve been reading a lot about web performance techniques and for some reason the data URI practice got stuck with me. I started off by creating the CBC gem (logo) in CSS to verify if CSS performs better than serving images. While I was playing around with that, I thought why not adding another dimension to the test and check the performance of the CBC logo as data URI. Voilà, I had my basic scenario for the test:

Check the performance of the CBC logo as

  1. An image in pure css
  2. A plain PNG image as background image
  3. A data URI (in CSS and inline with HTML)

Setting up the Test

The purpose of the test was to figure out what kind of presentation for the CBC gem would be the fastest and slimmest.

Prerequisites and Conditions

  • All HTML and CSS files were minified and use the same markup (despite the logo in pure CSS which needed to have a few more div classes to render the circles)
  • Each HTML version was tested with empty cache
  • Performance results were performed with WebPagetest (10 runs on an 3G simulated browser) to find the Median.

1. Logo in pure CSS (30px x 30px)

Pure CSS 30x30purecss Screen Shot 2013-05-01 at 5.40.46 PM
Description: Thankfully, the CBC gem consists of circles, 1/2 and 1/4 circles, those shapes can easily be created with CSS. I used this page to help me get started. Instead of setting up a fixed size and color, I decided to use SASS to help me be more flexible with my settings for the logo. The scss file lets me define color and size of the gem.
Note: Maybe the pure CSS logo has a bit of issues with some of the 1/4 circles but that’s probably due to some math formulas I didn’t do right in the SASS, I believe this can be ignored. Hence, This version cannot be used as the official CBC gem.

2. Plain PNG Image (30px x 30px)

PNG Image 30x30
png
Screen Shot 2013-05-01 at 5.40.59 PM
Description: Simple PNG file  included in the CSS as a background image. CSS included in main HTML.

3. Data URI in CSS (30px x 30px )

30x30 data URI

datauri
Screen Shot 2013-05-01 at 5.41.04 PM
Description: I used Nicholas’ tool to create my CSS files including data URI. However there are many tools to help you create your own data URI encoded files.

You can see from the browser screenshots above that all logos look pretty much the same to the user.

Test Results

Screen shot 2013-07-23 at 2.33.49 PM

The results show that the median load times serving the logo as pure CSS in comparison to the Data URI solution are being almost the same whereas the logo as a background image in CSS took the longest.

I looked at the base64 string and thought how big it would be if I had used a bigger image. So I googled and found the following “It’s not worth to use data URIs for large images. A large image has a very long data URI string (base64 encoded string) which can increase the size of CSS file.” (source). I decided to test it out myself. So, my next question was “How would the test above turn out if I used a bigger CBC gem logo”. I picked a width and height of 300px. While I was preparing the 300px x 300px pages, I also decided to create another version of the Data URI, not part of the CSS but inline within the HTML.

1. Logo in pure CSS (300px x 300px )

There was not much of a different in terms of markup and setup for the pure CSS and PNG in CSS version. I updated the SASS for the cbcgem.scss to accomodate a logo of 300px x 300px instead of 30px x 30px. The file size didn’t change much because it is all based on math calculations

2. Plain PNG Image (300px x 300px )

Instead of loading gem-30.png, I created a version gem-300.png and updated the CSS.

3a. Data URI in CSS (300px x 300px)

Screen shot 2013-05-01 at 9.09.15 PMI noticed that the size of the Data URI encoding as expected increased dramatically from a 30px x 30px encoded image to a 300px x 300px image (almost 10 times, see full view of screenshot on the left).

3b. Data URI inline within HTML (300px x 300px)

Screen shot 2013-05-01 at 9.31.58 PMInstead of pasting the long base64 string into the CSS, I added it as an img src to the HTML page (see full view of screenshot on the left)

I used WebPagetest again to run 10 tests to find the Median.

Screen shot 2013-07-23 at 2.33.57 PMThe links to the WebPagetest results can be found at the bottom of this post.

Observations & Take-Aways

  • Creating simple shapes in CSS (via SASS) is highly scalable because it doesn’t influence the size of the CSS file significantly. The size of the CSS file won’t change much if I choose to produce a 300px x 300px logo or a 10px x 10px logo. For all tests performed this solution seems to be the most efficient and fastest one.  
  • I didn’t  find the observation true that if the encoded image is bigger than 1-2kB it wouldn’t be worth using Data URI to improve performance. When looking at the last test round (300px x 300px), we can see in the results that the page with the encoded image is still faster than the page with a 300px x 300px PNG image.
  • It is interesting to note that the inline data URI version is faster than the data URI CSS version (and almost as fast as the pure CSS version).  Having to serve 2 HTTP requests with a total size of 4kB, the median load time was faster than the one serving the data URI via CSS.

Further Readings and References

WebPagetest Results

CBC Gem 30px x 30px

CBC Gem 300px x 300px

Simulating Frontend SPOF – The day a tiny 3rd party script almost slowed down the entire Internet

How realistic is it really that a script that you didn’t even write could dramatically slow down your site and other major sites as well? Keep reading….scripts can slow down sites and it hurts to watch!

I watched the Fluent talk by Steve Souders from 2012 about High Performance Snippets (must-watch for all SPOF fans) and got inspired to test out how an “innocent” 3rd party script (btw. I call them 3rd party monsters), not loaded properly could result in a single point of failure (SPOF) and to make a site very slooooooow to load.

Developers are always proud and optimistic about their code, and when it comes to including 3rd party scripts, basically code they don’t usually touch, they assume those 3rd party providers like Google never go down, a non-responsive ad server like DoubleClick won’t hurt or Twitter won’t have server failures. 3rd party script developers try their best to make it easy and painless for us to include their high performing scripts into our sites. That’s a fact. True but also not true. They can only do so much. If you don’t properly include the script on your end and their service goes down, their high performing code won’t be able to help you at all. The rule of thumb is to include those scripts asynchronously. That way you make sure that your content won’t be blocked from rendering in case the 3rd party service is down.

However, scripts that use document.write can’t be loaded asynchronously (unfortunately). Read more about this in the great Krux post and some of Steve Souders’ posts.

It’s kind of like the elephant in the room to me; you pray e.g. that Twitter doesn’t go down meanwhile you are too afraid to test it out or are over-confident that it won’t break your page or you basically don’t really know how you would test this scenario in the first place. Am I right? Well, what if you could run a quick test on a web site and pretend all of their 3rd party scripts and providers were down. Let’s play the “3rd party scripts game“: would your web site still render…how confident are you?

Simulating SPOF – Slow down your own site until it really hurts

Are you ready for this? First, edit your hosts file to point to a blackhole IP address for simulation (I used the blackhole IP address Steve shared in his talk on slide 9).

sudo vi /etc/hosts

While setting up my test, I don’t want to play the really bad gal (yet) and assume all 3rd party providers were down. I’d like to start with the simplest but yet most used and harmful domain ads.doubleclick.net. A lot of web sites include ads and use DoubleClick.

So let’s use this domain for our blackhole test. By all means, you can add more 3rd party scripts to your hosts file.

// add this line to your hosts file
72.66.115.13 ads.doubleclick.net

Once you’ve updated your host file, remember to flush your DNS cache after.

dscacheutil -flushcache

Now, open your browser (with cache disabled so your browser is not using any DoublClick scripts from the cache). Type in your site’s URL and be prepared for the worst. How long will it take for the website to load?

That’s a very easy (scary) and quick way of evaluating what is on your critical rendering path and obviously (now) what should not be on it anymore!

I ran this test on our site and let me tell you, it hurt. Period. It took almost 1.5 min for cbc.ca to display useful content faking that DoubleClick was down. The browser finally gave up.

Aborting

I wasn’t ready to stop the game. I wondered if it’s just our site that doesn’t properly handle the outage of one single domain such as ads.doubleclick.net. So I continued and tried the following random websites and measured the time it took so see useful content on those.

URL Time past to see useful content
www.people.com ~4.5 mins
www.bbc.com ~2.5 mins
www.amazon.com Fine, didn’t seem to use DoubleClick
www.cnn.com Fine, they seemed to be doing the proper handling
www.facebook.com Surprise, surprise Facebook doesn’t use DoubleClick. They use their own, so no real delay here.

 

If you don’t want to edit your hosts file and want to get more concrete waterfall and timing information as well as video captioning, try out what Steve Souders suggested in his Fluent talk by using the scripts (now SPOF) box at webpagetest.org to include DNS changes. The results will give you great details on how the website performed, with and without SPOF.

SPOF doubleclick

Note: I’ve tried WebPagetest SPOF myself and didn’t notice a big difference between non-SPOF and SPOF version; my suspicion is that WebPagetest might not be using empty cache for SPOFs setting. The tests I ran manually on my local machine showed more visible negative impact of the SPOFs (I shall confirm this).

3rd party scripts are everywhere

It was verified last month that 18% of the world’s top 300K URLs load jQuery from Google hosted libraries. So that means in theory if that service goes down and a web site uses JQuery from ajax.googleapis.com (and doesn’t have a fallback), the site might not work at all. Isn’t that scary? If you develop for a web site that already uses a CDN, don’t use Google’s CDN for scripts like JQuery. Avoid those 3rd party dependencies as much as possible.

I ran two queries on my local HTTP Archive database (dump from March 2013) and followed the same filter that Steve Souders used above. I restricted the query to only look at 292, 297 distinct URLs from the March 1 2013 crawl (with their respective unique pageid’s). I wanted to see how many of the top 300K URLs use Twitter widgets and any sort of Facebook scripts (without a distinction if they were loaded synchronously or asynchronously).

Twitter

Twitter

13% of the Top 300K URLs include Twitter scripts somewhere on their page.

Facebook

Facebook

29% of the Top 300K URLs include Facebooks scripts somewhere on their page.

Feel free to extend this exercise to include more 3rd party domains.

Cached 3rd party scripts

You can’t really rely much on the cache settings of your 3rd party scripts to ignore their outage if it happens for less than a few hours. 3rd party providers tend to set a very low cache time on their scripts to make it flexible for them to change the file frequently.

That setup plays against you in the case where you don’t load 3rd party scripts asynchronously. For example Twitter’s widget.js has a cache time set to 30 mins (only). I wonder what change could be so important for Twitter that can’t wait for more than 30min to be loaded on sites consuming this widget.js file.

So imagine the following: You go to a site with the Twitter widget loaded synchronously at the top of the page (bad!) at 9 AM (getting the latest, freshest version of widget.js). Twitter goes down at 9:10 AM. You go back to the site you visited at 9 AM, now at 9:15 AM, everything is still fine, you won’t see any problems because you are getting the cached Twitter widget script from the browser cache. What if Twitter is still down at 9.40 AM and you visit the same page again, you now are past the cache modified time and your browser will request a new version of the Twitter script, trying to reach the Twitter server that is still down. You are now getting a time out response for the Twitter script that (with the setup described above) will block the page content from rendering. Bottom line, you wouldn’t be able to see any content until Twitter is back up (and the cache has expired). It’s easy to check those cache times yourself, e.g. use Chrome dev tools and check out the response headers from those 3rd party scripts.

The screenshot below shows Twitter and Facebook’s cache-control settings:

TwitterFacebook

Conclusion

In order to really focus on your site’s performance, you need to isolate (potential bad) performance of 3rd party monsters (the ones that you decided to invite to your site). Don’t make your users wait for your own content if a 3rd party provider is down.

References

“One Web” in a Multi-Platform World – Contradiction or Challenge, You Choose!

I read “mobile is not different” and I childishly take it personal – Why? – I dig deeper as if I was my own therapist – Why does it affect me the way it does – It feels like hopeful idealism and the need to generalize complicated things. Not a bad idea, but only if I could take that 2MB desktop site and serve it to all mobile users and think they would be happy…my job would be done! It’s not that easy, when developing web sites, you need to overcome many more obstacles for mobile than for desktop sites that are served on a high-speed, crazy powerful CPU device, a.k.a. desktop computer able to present any kind of content (text, image, audio or video). I wish I didn’t have to worry so much about performance, formats or codecs, I wish I didn’t have to think about latency and bandwidth issues, packet loss, data usage or energy consumption when delivering pages to small battery-driven devices, a.k.a. smart phones. I wish the performance budget we manifest for page size and load times would be the same for desktop and mobile, well on “one web”.

I am a big fan of consolidating so if we want to consolidate “everything web” into “one web” let’s consolidate the “proper” way then, make the “desktop” or call it “one” web follow the same strict rules that mobile or any other platform has to follow. Let’s clean up this mess (but my only wish), let’s do it properly this time please….!

What does “one web” mean to us? Deliver content to all devices, make it accessible and working for all platforms – Check, yes!

In order to get there, can we just pick one platform, develop for it and assume all the other ones will just magically start working? – Check? Unfortunately no.

Technically and realistically, I have a problem wrapping my head around this term of “there is only one web”, “mobile is not that different” or “there is no mobile web”. It is though, unless I am misunderstanding what “one” means.

I think we all perceive the notion of “one web” differently, depending on what issues we want to solve and what job title we own at present time. For example, a performance advocate would say “One web – I wish one common web to develop for, no screen real estate issues, load time or page sizes issues when developing for mobile devices! Yah!” But that’s not the case. You have way more flexibility and allowance on browser with an high-speed home internet connection. Let’s continue, if you ask the sponsor of the site or a content strategist, they’d say “yeah, one web, I want e v e r y t h i n g and I want it e v e r y w h e r e” – Is that what people mean by “one web”?

I’m not sure if we can generalize so much. Think about it, how often do we use a 3G connection on our desktop browser? Or for example, data plans for cell phones (in Canada at least) cost a lot, and going over the usage allowance by e.g 100MB costs you way more on your mobile data plan than going over 100MB on your high-speed home internet package. Also, we can’t make use of touch gestures on a desktop computer (yet) the way they come in handy e.g when swiping through photos on a mobile device.

Also, what about context aware strategies and sites. While your big iMac might want to show you all 35000 clothing items served as high-resolution photos, as a data plan payer, I would appreciate if the smart developers would not send those to my mobile device because all I want to do at the bus stop is to see where the nearest stores is and their opening hours.

One web?

Stephanie Rieger has talked about the trouble with context before. And how does “Mobile First” fit in here? Does that mean that Luke wrote a whole book for nothing if mobile is not different or should be a platform to value anymore?

It’s about adaption, no?

Let the experts comment, the W3C Mobile Web Best Practices talk about “one web” as follows:

One Web means making, as far as is reasonable, the same information and services available to users irrespective of the device they are using. However, it does not mean that exactly the same information is available in exactly the same representation across all devices. The context of mobile use, device capability variations, bandwidth issues and mobile network capabilities all affect the representation.

Hands down – I totally agree with that.

I would love to not make an iWatch (?) or iPhone or a SmartTV “different” when it comes to designing and developing web experiences for them but unfortunately they come with different challenges to deliver the same web content. When developing for touch devices you might want to use gestures, maybe you don’t want to load those gestures for your non-touch SmartTV. Your UX might break because your web site design and screen reale state rely on gestures. Your iMac might not need to know where you’re physcially at right now unless you’re taking it with you (it’s so pretty I know) while passing a Starbucks store that has free “grande iced half caf triple mocha latte macchiato” today. Do you think you would want to know about this offer while sitting with a glass of red wine, at night, at home?

May I take a stab at this and rephrase this a bit. There is the web and it is accessible and part of more and more devices (toaster, I look at you, yes I am waitin’ ….), all different in their setups, browsers, their connection limitations, their size, their CPU. Of course you need to adapt and adjust the web to each and every platform because they are different but sometimes also share characteristics. The user experience of viewing a website on a 320px wide screen in contrast to an 50inch screen is different. I believe you still have to meet different requirements at least from a design perspective.

Or even see it the other way around. Value the differences and benefit from them. Mobile devices are smarter than desktop devices. Use their advantages to your own advantages, why wouldn’t you?

I am not saying that anybody I’ve mentioned or cited in this post is wrong, we all have different perspectives and problems to solve. On that note, I’d appreciate constructive criticism because in my opinion, that only shows that we actually end up caring and worrying about the same issue, don’t you think? It’s a good thing.

In an ideal world, the web just works everywhere and serves everything but I fear we are not there yet. We sure can work towards that by creating solid back-end systems with solid and structured content and media sources that each and every platform can pull from whenever and however they want.

It’s not that “mobile is not different”, it is that “platforms are different” or “mobile is just another platform”.

Sources:

  • http://venturebeat.com/2013/03/27/heroku-mobile-lead-mobile-is-not-different/
  • http://www.w3.org/TR/mobile-bp/#OneWeb
  • http://www.the-haystack.com/2011/01/07/there-is-no-mobile-web/
  • http://www.forbes.com/sites/anthonykosner/2012/05/03/seven-deadly-mobile-myths-josh-clark-debunks-the-desktop-paradigm-and-more/
  • http://www.lukew.com/ff/entry.asp?933
  • http://www.lukew.com/ff/entry.asp?1393
  • http://www.slideshare.net/yiibu/the-trouble-with-context

Setup your own HTTP Archive to track and query your site trends

Be honest, ever wanted to play “Steve Souders” for a day and pull some cool stats or trends about some web sites of your choice? Well, how about setting up your own HTTP Archive then?

Httparchive.org is an excellent tool to track, monitor and review how the web is built. You can dig into trends around page size, page load time, CDN usage, distribution of different mimetypes and many other stats.

You can download an HTTP Archive MySQL dump and the source code from the download page and play around yourself with the current data. For example, do what Stoyan Stevanov did by asking yourself some questions: “Hm, I wonder what are common mime types these days”. Once you’ve setup the database, you can easily query anything you want.

However, what I personally find the most intriguing and fun is applying all of this to sites of of your choice. Alright, let’s break this down: if you’re famous and your site is listed under the Top * Alexa sites, then you can use the official dump, if your target site is a wee bit less famous and not part of any of the crawled sites, you might want to start using your own database and local instance of HTTP Archive. That way, you can run this handy tool on any of the web sites you want to test.

Things to consider before you get started

You need MySQL, PHP and your own webserver running. If you choose to run your own private instance of WebPagetest, you won’t have to request an API key. I decided to ask Patrick Meenan (pmeenanATwebpagetestDOTorg) for an API key with limited query access. That’suffcient for me for now, if I ever wanted to use more WebPagetest runs per day, I’d probably want to setup a private instance of WebPagetest. I’ve done this before but my computer had to be replaced, and I haven’t had the time after to set this up again.

Sample setup

bulktest: That’s the folder you really want to understand and work with when setting up your own little HTTP Archive baby.

  • bulktest/README.txt: This file gives you a general intro on how to use the folder, I recommend to read this.
  • bulktest/bootstrap.inc: In case you choose to us a private API key for WebPagetest, you will need to update this file with the provided key

To run a nice little batch, you want to execute the following scripts after each other via CLI (default setup for security reason)

  • bulktest/batch_start.php: This script takes pre-defined list of URLs (importurls.php) that you can specify or change. By default it’s downloading the latest  Alexa list (downloadAlexList()) and imports those into the urls table. I’ve changed this so it’s picking up my own csv file with the urls I want to crawl but you can also customize this the way you want it (default setup needs to run via CLI)
  • bulktest/batch_process.php: Run this as often after each other until you get confirmation that your runs were successfully recorded (default setup needs to run via CLI)

Batch summaryIt always gives you a nice batch summary at the end so you know where you are at (see screenshot)

  • bulktest/statscompute.php:  This is needed for the rendering of the stats under your local URL, e.g. http://localhost/httparchive (default setup needs to run via CLI)

More detailed steps on how to install HTTP Archive can be found under the blog post Setting up HttpArchive private instance. As suggested by the README.txt file and this blog post, it’s probably useful if you setup cronjobs in your environment to automate the batch steps.

Front-end piece: Visualizing your trends and stats by filling the charts

Congrats! Assuming you’ve successfully setup your own HTTP Archive instance  – wasn’t that fun and the bit of pain worthwhile? Now you can start viewing those charts and investigating trends and stats targeted to your defined URLs. The beautiful thing about having your own instance is that you can be your own master of data visualization: you can now create additional charts beside the ones that came out of the box provided by the default HTTP Archive setup.

And if you don’t like Google chart tools, you may even want to check out d3 or Highcharts to use instead.

From now on, the sky is the limit. Nobody can stop you now, my friend – you can even run some kick-ass raw database queries if you don’t really care much about the front-end visualization (I do ;))

Back-end piece: Querying the database directly

Sometimes, you want to get some questions answered without creating a pie or a chart. That’s when you can make use of the MySQL tables directly that have been setup for you (via schema sql file and filed by your batches).

Let’s run a simple query on the requests table.

For example, some of our sites use YUI, some use JQuery – but we would really like to avoid having pages serve both.

A simple sample query like the one below could help identify those sites:

select req_referer from requests where url like '%/i/l/yui%' or url like '%jquery-%.js' group by req_referer

Be prepared, some setup time is required

I’m not going to lie, it took me some after-work evenings, many debug statements via PHP to set everything up so I could run proper batch_start and a proper batch_process commands and fill in those pies and trends. Here are a few things to watch out for

  • I hadn’t installed pcntl with my PHP version, so I needed to set this up first. You will need this to run batch_process.php. Installing pcntl for php on osx Lion blog post helped a lot – Thank you Jacob!
  • You might have to adjust some of the tables values, I got lots of mysql insert/update errors regarding default values not set for certain fields
  • After I received my API key, I still had to change the WebPageTest URL after Patrick pointed me to the correct URL (Thanks again). The $gWPTUrl variable was set to http://httparchive.webpagetest.org as default instead of http://webpagetest.org in settings.inc
  • Some of the tables require you to have a temp/dev version of it as well, e.g. requests, urls, statsdev etc. Some of the scripts look for e.g. requestsdev. You might have to copy a few of the original tables during setup process. You can setup the naming convention in dbapi.inc

Next Steps

Where am I at? Well, I just finished a few successful batch_processes on a selected number of URLs. It’s fun to monitor trends based on the URLs I put in. I will be collecting more ideas and uses cases over the next few months, possibly also adding some more charts applicable to my needs.

If time permits, I hope to be sharing more of my customization and use cases of HTTP Archive at this year’s Velocity conference in San Jose. And if not there, I’ll try to update this blog post as best as possible. If you have any questions or suggestions, please leave a comment below.

I am always happy to receive feedback or happy help out with some roadblocks while setting this all up.

Thanks

to….

My contribution as a female tech speaker – Living the minority

I love talking about, and listening to the things I am passionate about (who doesn’t?). I love sharing my knowledge and learning about others. I love conferences. It’s a great place to learn new things, validate your knowledge, connect with like-minded and come back home with a bunch of things you want to try out and work on.

So it happens that one of the things on “my list of things to do in life” is/was to present at a conference. I happily and proudly checked this off last weekend.

I had the pleasure to speak with a smart colleague at FITC’s “Web Performance and Optimization” conference in Toronto, this past weekend.

Our topic was similar to the one we submitted (and got accepted) to the O’Reilly’s Velocity conference in San Clara this summer. I’m so beyond excitement to be presenting similar things (and more) to all those great and talented web performance enthusiasts in a few months.

Allow me to give a brief recap of the presentation from last Saturday – the way I experienced it.

It was a small conference, around 70 people attending, probably ~7 of those attendees were women – that’s it, not more! Well, not a huge surprise to me, I’m used to that from my time as a Computer Science student 10 years ago. But is the ratio still so drastic? Oh, and in addition, I was the only female speaker that day.

While I was listening first (and later presenting myself), I noticed that most male presenters had a very specific way of selling and promoting themselves – They all were very confident (Not jealous, good for you, boys!). A supportive and beautiful person on my side that day, full of great constructive criticism, noted something after I was done presenting. She confirmed something that I had honestly (and secretly) already felt, she said I could have been more promoting myself “..like the guys did”. It’s true, as the only woman speaking that day, I could have represented the female minority better by maybe emphasizing my successful web performance results to those 63 men and 7 women that day. Well, it’s not that I wasn’t passionate about my topic – Maybe it’s just that women share their success in a different, less self-selling way and/or are less confident.

I’d like to quote something from Geek Feminism now:

So! Getting women to submit content: easy? Um. When I’d talk to men about the conference and ask if they felt like they had an idea to submit for a talk, they’d *always* start brainstorming on the spot. I’m not generalizing — every guy I talked to about speaking was able to come up with an idea, or multiple ideas, right away…and yet, overwhelmingly the women I talked to with the same pitch deferred with a, “well, but I’m not an expert on anything,” or “I wouldn’t know what to submit,” or “yes but I’m not a *lead* [title], so you should talk to my boss and see if he’d want to present.”

Ok! So I guess I am not imagining all of this. It really seems to be true that men are generally more confident than women when it comes to work related areas where they can promote themselves.

The beautiful thing about life is that you (can) always learn and get better.

And to be honest, my observation at FITC’s conference has even more encouraged me to submit call for speakers forms! I enjoyed presenting! Like a lot – You ain’t stoppin’ me now.

Below the slides from our talk on Saturday

Additional resources in regards to women in tech and female speakers

  • https://plus.google.com/communities/101818001236662563704/stream/02ee47c3-6a09-4925-8467-e503c684c4ce
  • https://twitter.com/callbackwomen
  • http://www.facebook.com/ShePlusPlus
  • http://2012.jsconf.eu/2012/09/17/beating-the-odds-how-we-got-25-percent-women-speakers.html
  • http://geekfeminism.org/2012/05/21/how-i-got-50-women-speakers-at-my-tech-conference/

Applying weight loss paradigms to web performance – the page weight points system

I like to compare web performance and page weight challenges to real world scenarios. I continue to enjoy John Allspaw and Steve Souders’ performance about a stronger and faster web  at the Velocity Conference 2012. Those things stick, don’t you think? It’s fun to apply web performance considerations to real world weight and fitness rules.

I agree with John & Steve; I strongly feel that there are many similarities that body weight loss and page weight loss have in common. Non-fatty diet complements healthy life. Non-heavy pages complement fast websites and more visitors. Exercising makes people stay healthy and happy. Practicing web performance rules result in faster websites. Keep watching your weight, don’t slip, keep up with the training, and then working out becomes easier with time.

Lately, I’ve been thinking about how to engage people in the organization I work at to get excited about web performance and to make it part of the product development cycle.

Hands down, this is not new stuff that I am proposing here, many smart people in the web performance community have been talking about this for a while. Some great must-read articles below.

All of those articles share the same thought: Come up with concepts and ideas to engage people to buy into web performance.

The idea here is that you make performance part of the process instead of something that may or may not get tacked on at the end. (Tim Kadlec)

You need  to get the people who make the decisions about your products excited about performance and the value of making things fast(er) and perform well. This needs to happen early on in the process. Everybody needs to be on board, including the IAs and designers.

I’ve noticed that if you show people numbers, graphs, % (percentages) and A/B testing results to emphasize on web performance, they start to pay attention and show interest, sometimes they even are so taken away by the results that they share them with other departments and colleagues. That’s what you want, that’s great, however I keep wondering what else they’d like to see to keep them motivated and engaged in the value and importance of web performance.

While Mark talks about “budget” and Christian about the “Vanilla Web Diet“, or tools like YSlow or PageSpeed aim to give marks, I’d like to suggest another possible engaging way to keep performance on a constant radar during product development and not only after (when it’s often too late).

The Weight Watchers Points System

Background

Everybody knows about Weight Watchers, right? This thing seems to work. Ask Jennifer Hudson or ask people you know who’ve tried it. It really seems to work. So, why is that?

  • You don’t have to starve yourself while dieting
  • Social support & support groups help you to not give up
  • The points system: Calculated based on body size, activity level and desired weight loss, you are assigned a certain number of points for each day, e.g vegetables and fruits tend to have few or no points whereas – you guess – sweets, bread and potatoes are higher points assigned to.

Page Weight Points System

How about we promote a page weight points system to the team we work with (and the client who asks for a “heavy” product) to emphasize the necessity of web performance. Wouldn’t it be fun if you came up with a points system for elements on a page or within an app that outlines what and how they impact performance. Each of those elements can then be assessed if they are needed and if so could also undergo a diet if needed.

The points system could help to make performance more tangible.

May I present a rather light (and not so serious) analogy on how you could get started with assigning points to the elements of your product.

  • Ads: That in my world could be compared to that chocolate cake that only tricks you and teases you, takes away needed space but doesn’t really make you look better but brings in money.
  • Tracking: It’s like essential oils and fats that one needs to continue the business in order to grow and make decisions.
  • Social plugins: That could be the beer or cocktail you’d want to drink at the bar so you feel more comfortable chatting up that cute girl next to you.
  • Images: Depending on the purpose of the page/app, images could be beneficial or even required to engage visitors to your product , so you could compare it with Vitamine C from an orange, for example.
  • Text: Text is very light but is needed to represent the product, that could be compared to water. I doubt the points for that would be high.

Once broken down in elements, you could now assign points to them.

The points system could be accompanied by blacklisted temptations that stakeholders might want to throw in the product.

If you were to decide to introduce a points system like Weight Watchers does, please remember the danger with any diet or weight loss program: avoid the yo-yo effect once the product is out.

Advanced Web Performance Techniques – Part 1

Happy New Year, Everyone!

Web performance is something dear to my heart and is something that can make me browse the web for hours.

My favorite holiday treat is @perfplanet‘s performance calender, every year!

  • Who doesn’t like to improve things?
  • Who doesn’t like to make things more efficient and faster?
  • Who doesn’t want Steve, Stoyan or Ilya give us that one last great webperf thought before the year ends?

Today, I’d like to write about a few techniques I’ve read about in the last few months that I consider advanced and experimental in regards to web performance.

That being said, today’s post is not about the common principles on how to make websites faster, posted by Yahoo! several years ago. While those are still valid and should be followed by every web developer and organization on this planet, I want to focus on those not so commonly known principles and techniques on how to enhance the performance of your mobile web app/site (at least the ones that still give me the “awwww – I didn’t know, wow!”). Mostly principles that can be applied to your server or CDN.

Techniques pushed by Google

I love Google’s “Make the Web Faster” site.

I only recently discovered the (experimental) features of mod_pagespeed, the open source Apache HTTP server module that automatically applies web performance best practices. Instead of making web devs do the performance work, let the web server do that job by applying filters that enhance performance.

mod_pagespeed filter: Canonicalize JavaScript

When developing (mobile) websites, developers tend to include commonly used libraries, JQuery being one of the top ones. So, imagine you browse from website A using JQuery to website B also using JQuery. The browser automatically fetches JQuery again from domain B, although it was just requested from domain A.  A waste of bandwidth, something seems inefficient here, doesn’t it? So, the great minds from Google came up with a solution to reduce the obvious inefficiency by introducing the canonicalized JavaScript filter: The idea is that you re-use commonly used libraries (e.g. JQuery) on the web by first replacing it with the equivalent library hosted on Google’s CDN and then when browsing e.g website B, use the copy of JQuery that was previously fetched from the shared library (Google’s CDN).

More about that in this blog post or the direct link.

mod_pagespeed filter: Combine JavaScript (Experimental)

Everyone who cares about web performance knows that reducing HTTP requests is one method to enhance performance. How about your web server doing this job for you? It reduces round-trips to the server and also reduces latency issues. The combine_javascript filter concatenates all used JavaScript files on one’s page and replaces it with a single one.

Check out their demo page.

More fun mod_pagespeed filters here.

Delta Delivery

This is a nice one too. I found the idea and presentation at one of the W3C Performance Working Group meetings (excellent presentations).

How often does a web developer change the entire core of their JavaScript framework or libraries? Not that often, right? So on average, one might only add another feature to the JavaScript core files that visitors need to fetch while re-visting the website. The proposed delta technique aims to reduce the size of JavaScript and stylesheets by sending only the difference “between what the client has (in cache or local storage) and the latest version”. The benefit here is that the delta will, most of the time, be smaller than the original source. “Server computes and returns encoded delta between version X and Y which is much smaller than Y itself”.

More information about “Browser Enhancements to Help  Improve Page Load Performance  Using Delta Delivery

Others & General

It’s not only Google who tries to be smart about predicting users behaviors and trying to enhance speed performance based on situational conditions.

  • Akamai has always been on the forefront of optimization and delivering content. Their situational performance techniques are fun and easy to follow. Aqua ION focuses on front-end optimization.
  • Also Cloudflare suggested their own technique called Rocket Loader to combine JavaScript and Stylesheets more efficiently.
  • Guy’s situational performance post on perfplanet’s calendar 2012.

And some client-side techniques that I don’t want to forget to mention:

This is by no means everything innovative and great that is out there to make your site faster. I appended “Part 1” to the title of this post because I know I will soon find more great techniques that will be worth mentioned. So please stay tuned.

And of course, feel free to share any other exceptional tricks worth mentioning.

Web Performance & Responsive Web Design: Disconnected or Compliant?

We’ve all been there: people throwing around the word Responsive Web Design (RWD) in web project meetings, stakeholders can’t stop talking about it, and even the non-technical Project Manager might have tried to pitch this idea to you on your elevator ride to your desk.

We have to give those optimists all credit because RWD stems from a great idea: Simple multi-screen Web Development. In general, it is indeed a great approach but not everything that is great in theory works great in practice (duh!), especially when rushed to follow a new trend.

Slide 32: http://www.slideshare.net/guypod/performance-implications-of-mobile-design

Putting my mobile web performance hat on, I have to be honest, there is something that doesn’t sound quite right when my ear hears the buzz word “Responsive Web Design”. I don’t think I am very off with such perception. Please check out the screenshot I took from @guypod presentation. Guy is the Chief Product Architect at Akamai, specializing in Mobile Web Performance. I encourage everyone who is interested in Mobile Web Performance to follow his talks and tweets. His slide on the right shows that most of the websites built with responsive design in mind do not optimize for different screen sizes. This poses a problem, particularly for mobile devices.

Performance on Mobile

Performance is key for mobile websites. The latest research shows that responsive websites as of today don’t quite (yet) focus on web performance. The stats above reveal that 86% of the sites using responsive design don’t optimize for mobile. While being viewed on small-screen devices, those pages have the same page weight as the ones being viewed on a large-screen device.

Stop! Is that a negative side effect of RWD or just something that was not paid attention to? Does RWD and Web Performance go hand-in-hand or are they disconnected?

It is highly recommend to optimize your site for battery-powered mobile and small-screen devices. But what does “optimizing for mobile” mean? People might think that optimizing means they can re-arrange/optimize content and use fluid grids to be responsive for mobile devices and small screens. What I feel, they often forget is to also optimize performance for different devices and screens.

The key is to make your mobile presentation load fast on small devices.

While today’s desktop sites don’t have to be too strict on page performance (sadly, yet), sites viewed on a mobile device need to be “performance-optimized” to load content fast especially being on a cellular network.

74% of mobile visitors will abandon a website if it takes more than 5 seconds to load. In other words, you have 5 seconds to get someone’s attention. Make it count. (Brad Frost)

Further more, users appreciate pages that don’t drain battery power or add a significant amount of data usage to their data plans by using desktop-sized images or non-optimized scripts and stylesheets.

So how do you develop a responsive website that does not belong to the 86% sites that Guy mentioned in the slide above?

Here are a few risks, hints and recommendations while developing a responsive site.

Potential Risks (and Problems) with RWD

  • Performance might suffer for the sake of making a site responsive. If you make your site responsive, think about making it performance-optimized for different devices as well. Sometimes it makes sense to still serve different sites to different devices, e.g. you could have a responsive site for desktop and tablet but your mobile site uses a different implementation.
  • There is a risk of overloading-downloads (hiding content != reducing page size): If you choose to hide content based on the screen size, remember you still download the content if you do it with media queries. The page weight will stay the same. Media queries don’t prevent CSS downloads.
  • Review integrated 3rd party scripts/products: Check if the 3rd party product offers a mobile web friendly version because the desktop version might be too heavy (file size and processing). Also, make sure to identify if you need mobile sensitive logic included when using them (e.g. sometimes ads need different implementation of code for mobile vs. desktop).

Recommendations

  • Don’t be lazy and only focus on the presentation of your content being responsive, take responsibility for optimizing the performance for mobile.
  • Identify heavy and CPU intensive elements such as big images, scripts that maybe need to do things on your desktop-viewed site but maybe not on your mobile-accessed site. For those elements, you need to find a solution to optimize them for mobile. Otherwise performance will suffer and users will be upset with your pages being slow.
  • Avoid extensive client-side processing (JS scripts, non-optimized 3rd party scripts) and try to move the logic from the front-end to the back-end. Use server-side technologies to detect platform and device (capabilities) on the backend to load mobile-friendly scripts and implementations for mobile, e.g ad code, tracking, any 3rd party tools.
  • Presentation of different images sizes per platform should ideally be handled on the server side.
  • Content with only little script/logic can be displayed in a responsive matter, e.g. a box with single-column content on mobile could be displayed as two-column content on desktop.
  • Think about Mobile First (progressive enhancement) vs. Desktop First (graceful degradation).

My suggestion is to create a nice mix of server side detection and responsive design elements. And to be fair, this is not something that is new or a paradigm I created. It’s called Responsive Web Design with Server Side Components.

Go with RESS

Luke W & Dave Olson have been talking about this approach for some time now.

RESS stands for Responsive Web Design with Server Side Components and describes the combination of using responsive design approaches aligned with server side components for optimization. It helps to avoid what Guy described in his slide at the top. With server side techniques you will be able to offload some of the heavy page weight upfront before serving it to the client but still applying media queries to accomplish a responsive design approach.

And before I use the word RWD one more time, I want to end this post with a quote from Brad Frost’s presentation at the BDConf 2012 in Dallas (slide 159)

Users don’t give a s**t if your site is responsive

Responsive Design can work if you also focus on performance to make the site better and faster.

Here some links that are worth checking out: