Cut down intro as much as possible.

This commit is contained in:
Dan Callahan 2013-02-12 14:53:32 -06:00
Родитель 8f59e974f0
Коммит 941e6717ac
1 изменённых файлов: 14 добавлений и 87 удалений

Просмотреть файл

@ -1,108 +1,35 @@
> This is episode 6, out of a total 12, in the [A Node.JS Holiday Season series](https://hacks.mozilla.org/category/a-node-js-holiday-season/) from Mozillas Identity team. Its the second post about achieving better front-end performance.
Last time, we introduced some common tactics used to optimize website performance. This time, we're going to look at how to measure your performance optimizations to see if they're working. We'll introduce some simple tools for measuring changes and simple statistics for validating data, and then we'll use those tools to measure the performance impact of adding HTTP caching.
You've implemented the [Three C's of client side performance](https://hacks.mozilla.org/2012/12/fantastic-front-end-performance-part-1-concatenate-compress-cache-a-node-js-holiday-season-part-4/), concatenate, compress, and cache, but how do you know they're working? This post introduces simple tools for measuring the performance impact of optimizations like HTTP caching.
# Review of common tools and data used in measuring website performance
## Measuring Performance: HARs, Heuristics, and Statistics
### HAR files, waterfalls, and dev tools: generating, visualizing, and formatting performance data
### HTTP Archives
While there are lots of open-source and commercial tools available to measure website performance, practically all record performance data using the same open standard format, the Http Archive or HAR file. HAR is an open, extensible JSON spec that represents performance data: HTTP headers, plus additional info on network/DNS waits and transfer times, plus other details:
While there are many open-source and commercial tools available to measure website performance, practically all record data in the same format: HTTP Archive (or "HAR") files. HAR files are standard JSON documents containing performance data like HTTP headers, network / DNS lookup times, data transfer times, and other details.
"pageref": "page_10536",
"startedDateTime": "2013-02-12T02:10:02.677-08:00",
"time": 371,
"request": {
"method": "GET",
"url": "https://login.persona.org/",
"httpVersion": "HTTP/1.1",
"cookies": [],
"headers": [
"name": "Host",
"value": "login.persona.org"
},
{
"name": "User-Agent",
"value": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:18.0) Gecko/20100101 Firefox/18.0"
},
...
Most tools display the performance data in the same way, too: using a 'waterfall' visualization. Here's one example, click through for more:
Most tools display the performance data in the same way, too: using a 'waterfall' visualization. Here's an example, click through for more:
insert waterfall screenshot here
insert waterfall screenshot here
linked to http://www.webpagetest.org/result/130212_K7_ARC/1/details/
The waterfall represents the sequence of events from initial page request, through all page components being requested, downloaded, and processed, until the page settles. Scrolling downward through the list of files goes forward in time. Similarly, the individual files' entries flow from left to right through time, showing clearly how much time passed between request and completed response.
The waterfall represents the sequence of events from initial request, through all page components being requested, downloaded, and processed. Scrolling downward through the list of files goes forward in time. Similarly, the individual files' entries flow from left to right through time, showing how much time passed between request and completed response.
HAR files and visualizations can be generated by tools in every modern browser: firebug for firefox (exportable via NetExport firebug plugin); dev tools on chrome or safari; dynatrace for IE < 9, and IE dev tools for IE 9+; for mobile, web perf bookmarklet, weinre, or fiddler. There's also a number of standalone or web-based tools, like webpagetest.org, that can generate, analyze, and visualize HAR files.
Most browsers have tools built-in or available as extensions for generating and visualizing HAR files, such as the [NetExport Firebug extension](http://www.softwareishard.com/blog/netexport/) for Firefox.
### Many tools use heuristics to intelligently suggest optimizations & grade page performance
### Heuristics
To help make sense of the information, many tools provide heuristics to suggest optimizations to apply to the site. YSlow was the first such tool; it's an open-source firebug plugin (with standalone command-line mode) maintained by folks at Yahoo. YSlow applies one of several rulesets to a HAR file, and gives the page a simple letter grade for its performance. Many other tools exist.
To help make sense of the information, tools like [YSlow](http://developer.yahoo.com/yslow/) use heuristics to suggest potential optimizations to a site.
screenshot of YSlow on login.persona.org
While the number of optimizations can seem overwhelming, the golden rule is to minimize the number of HTTP requests made on a given page; this has the greatest impact on overall page load time. So, once we've decided we want to optimize front-end performance, the simplest way to measure whether it's working is to count up the number of HTTP requests, and the overall page load time. Depending on the nature of your app, you might care more about the time till the main content is visible, or the time till the content above the fold is visible, but page load time is a fine starting point. You can always get more nuance in the future.
## Simple statistics can be used to quantify uncertainty in front-end performance measurements
While the number of optimizations can seem overwhelming, the golden rule is to minimize the number of HTTP requests made on a given page. Thus, one simple way to measure performance is to count up the number of HTTP requests and the overall page load time. Depending on the nature of your app, you might care about more nuanced measurements, but page load time is a fine starting point.
We want to measure our sites' performance before and after applying optimizations, in order to figure out if we're actually improving anything. The simplest way to do this is to compare page load times from firebug (or some other tool) before and after. This doesn't actually prove that you caused the difference in load times, however; it's possible that the time randomly fluctuated. With just a few more load test measurements, and a stats library (like @bluesmoon's node-faststats), we can quantify our uncertainty, allowing us to decide if we likely did cause a meaningful change.
### Simple Statistics
### Standard deviations, T-tests, and p-values, oh my
Measurements of page load time will fluctuate with network congestion and server load. In the face of that uncertainty, how can we be sure that we've made meaningful changes? The answer: applying simple statistics over many measurements.
Suppose you take 10 measurements of a page's load time. A standard way to fit the 10 data points is to assume they're distributed in a bell curve around their mean; roughly speaking, the standard deviation is a measure of how wide the curve is, and we can be 95% confident that the actual center of the bell curve is in the range [mean - 2 std dev, mean + 2 std dev].
If you've taken measurements of load time before and after applying an optimization, you'll have two mean values, each of which will have its own standard deviation. The total improvement is the difference between the two means, and the standard deviation of the difference is sqrt(a^2 + b^2).
There's one final tool we can use to determine whether the difference between two means is statistically significant: the t-test. The t test gives the odds that the observed improvement was caused by random chance; if its output, the p value, is <= 0.05, the improvement is statistically significant.
OK, great. Now that we've touched on how to measure front-end performance, and how to ensure our measurements are statistically significant, let's go back to HTTP caching and measure its impact in a simple sample case.
# Example: Measuring impact of primed vs unprimed cache
Let's take a super simple example. I loaded login.persona.org in my browser 5x in a row and recorded the onload time reported by firebug. Then, I surfed to a different page, and back to login.persona.org, again 5x, so that I could take advantage of the browser cache. Even with a crappy sample size like 5 runs, we might see such a huge load time improvement that it's statistically significant.
- insert screenshot w/waterfall
- insert close-up screenshot with the load time
Calculations:
Page load time results (onload fired): 4.51, 3.07, 2.89, 3.37, 3.18.
Mean: (4.51 + 3.07 + 2.89 + 3.37 + 3.18) / 5 = 3.40
Std Dev = sqrt of: sum of squares of deviations from mean, divided by # samples - 1, or:
= [ (3.40 - 4.51)^2 + (3.40 - 3.07)^2 + (3.40 - 2.89)^2 + (3.40 - 3.37)^2 + (3.40 - 3.18)^2 ] / (5 - 1)
= 0.64
This means the average is 95% likely to be between 3.40 - 1.28 and 3.40 + 1.28, or, in (2.12, 4.68). That's an awfully crappy, wide range--but maybe it's good enough for this example.
- insert screenshot waterfall showing BFCache
Calculations:
Page load time results (onload fired): 1.72, 1.74, 1.49, 1.64, 1.43.
Mean: (1.72 + 1.74 + 1.49 + 1.64 + 1.43) / 5 = 1.60.
Std Dev = sqrt( (1.60 - 1.72)^2 + (1.60 - 1.74)^2 + (1.60 - 1.49)^2 + (1.60 - 1.64)^2 + (1.60 - 1.43)^2 / 5 - 1)
= 0.234.
95% interval is 2x that on each side of the mean: (1.60 - 0.46, 1.60 + .46) = (1.14, 2.06).
- what's the savings? difference in means.
3.40 - 1.60 = 1.8 second savings, 53% savings.
- calc the std dev of the difference.
std dev of the difference is: sqrt of sum of squares of std devs / # samples:
= sqrt( 0.234^2 / 5 + 0.64^2 / 5)
= 0.30.
So we expect (with 95% certainty) the improvement to be between 1.2 and 2.4 seconds.
- calc the p value for the t-test; show it's a significant difference.
not going to do the p value calculation manually; going to throw it into some library.
p = 0.01
We can conclude that there's a statistically significant difference between primed + unprimed cache, so huge that you can get 95%+ certainty with just 5 before and 5 after samples. You can see how, for smaller improvements, say around 5% or 10%, you'd need a much greater number of samples to be certain you were really having an effect.
Let's recap. So far, we've covered basics of web performance measurement, and basic stats to help us understand when our results are meaningful. Next, let's take a deeper look at HTTP caching, and explore how we can improve performance for dynamic content.
We'll take a series of measurements before and after our optimizations, calculate the mean and standard deviation of each data set, and compare the difference in means to quantify the improvement. Lastly, we'll use a t-test to determine if the improvement is statistically significant, or if it's potentially due to random chance.
# HTTP caching review