I normally spend a few (3?4?8?9?12?) hours everyday just browsing the Internet. Most of the time I'm using the same pages: HN, netvibes, facebook, Quora, wikipedia, etc...

Let's study what kind of information my browser downloads from this pages [1]:

  • HackerNews: 1 css file. no JS files. 3 images. (total 5 requests)
  • Netvibes: 2 CSS files. 4 JS files. 66 images. (78 requests)
  • Facebook: 7 CSS.  17 JS files. 67 images (total 98 request)
  • Quora: 2 CSS files. 9 JS files.  30 images (total 51 requests)
  • Wikipedia: 2CSS files, 13 JS files, 6 images (total 23 requests)

You must have noticed that I'm just looking the static files like CSS, JavaScript or images. Why this files? Because every time I refresh the page the web browser tryies to download the files again.

But you will say: Com'on man! The web browser downloads the CSS files in the first access of the page and the second time you request the same file it uses the HTTP parameter ETAG o LAST-MODIFY and the server just send an HTTP Not Modified 304 code, right? We are not really downloading the content twice! We are just checking we have the last version!

Ummm... yep. You are right. But I'm asking to myself: What is the cost of making an HTTP requests just for knowing that the file has not changed?

So, more or less what is the cost of this ~~no very useful~~ requests?

  • Open a TCP connection to the server: 3 TCP paquets (SYN-SYN/ACK-ACK) 196ms
  • Make the HTTP request and receive the response:  2 TCP paquets. 201ms
  •  Close the connection: 4 TCP paquets (FIN/ACK-ACK --- FIN/ACK-ACK): 192ms

Ok. Creating a HTTP connection is 196ms and a single HTTP request/response is 201ms. Close a TCP connection is not really important since you have already recived the needed information and the browser is able to render it, so we can ignore this value.

The formula could look like:

Cost = ( time(OpenTCPConnection)+time(HTTP Request/Response) ) * NumberOfRequest

But this does not make sense because with this formula, Facebook would need 38 seconds for doing 96 requests. And we are supposing that we don't start a new connection before we finish the active connection. This suposition is false.

Nowadays the browsers use "Connection: Keep-Alive" HTTP parameter, that prevents to open a lot of connections to the server and reuse the same connection for more than one request. So, the formula could be something like:

Cost = time(OpenTCPConnection) + time(HTTP Request/Response)*NumberOfRequests

Cost = 196ms + 201*NumberOfRequests

Let's apply the formula to the sites we are playing with. The first value is the EXPECTED time (applying my formula above)  ONLY for the static content (CSS/JS/Images). The second value is the ON LOAD value provided by Firebug (the time required for download ALL the information of the page).

  • HackerNews:  1 s --  1.2 seconds
  • Netvibes:  14s  --  6.1 seconds
  • Facebook:  18.5 s  ---   6.4 seconds
  • Quora:  8.4s -- 5.1
  • Wikipedia: 4. s   -- 2.3 s

Umm..... this is a little bit odd. My formula says that I need 18.5 seconds for downloading the static content of Facebook.com but Firebug (that don't lie) says that i really only need 6.4 seconds for doing so. I'm missing something.

Oh! Yep! The browser don't just use Keep-Alive in the HTTP connections but open about 6 concurrent TCP connections[3] per HTTP domain. That means that the browser is doing all the HTTP concurrently thru different TCP connections AND using "Keep-Alive" at the same time for being able to reuse this TCP connections for the queued HTTP petitions. A little bit confusing but it make sense.

Well, now I have a problem trying to get a "formulta" that calcules the time of make some number of requests because usually the pages like facebook or wikipedia uses subdomain for diferent data like: static.facebook.com or photos.facebook.com. It means that firefox could open 6 TCP connections to facebook.com, another 6 TCP connections to static.facebook.com and so on. That's because there are different domains.

But well. With this small study we see that is not costless making HTTP requests just for checking if I have the latest version of a file. Solutions? The most easyest one is install a caching HTTP proxy in the client side and cache all the CSS/JS/Images files. The HTTP petitions will be done exactly the same, but the local proxy will return a 304 almost inmediatly. This low latency will make the pages load a waaaay more quikly.

And this is what I'm going to play with ;) Keep tuned!

[1] Using Firebug for extracting this information.

2 I've done the test using wget for making the requests and Wireshark for intercept the paquets. The targes was "terra.com", a webpage of my same country (low latency, the resoults could be worse if the target server is in the States or Asia).

[3] Well, it depens on the browser. IExplorer just open 2 concurrent connections. Firefox seems to open maximum 6 concurrent connections, but it depends of an unkown behaviour (maybe it depend on the number of the queued requests, who knows? )