Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Minify and Gzip (2022) (wesleyac.com)
125 points by Tomte on July 17, 2023 | hide | past | favorite | 80 comments


I (thought) I saw where the article was going before it got there, but, the notion that the 'optimization' actually __increases__ gzipped sizes surprised me (I thought the conclusion was going to drive at 'it makes no difference', or 'only a byte or two saved').

I don't have a site that sees a ton of traffic, and I know what some of my direct competitors spend on IAAS (and how long their pages take to load). I'm orders of magnitude below it - hence, I do not spend much time investigating optimizations. I can name a list as long as my leg as to where I bet I can win quite a bit on optimizing for IAAS cost or UX speed. Business-wise I decided it's not (currently!) worth investigating.

One of the more intriguing ones on my list is that I __do not minify anything__. My reasoning is that post-gzip, who cares. It's such a tiny difference. Yes, turning e.g. class names or whatnot in the CSS, or local identifiers in javascript, into short letter sequences means even post-gzip it'll be smaller, but by so little. The reduction in file size when e.g. eliminating all unnecessary whitespace is really, really tiny.

For any site that isn't trying to optimize for 'millions of hits a day!' levels of traffic, why _do_ you minimize all that stuff? It makes the developer experience considerably shittier and the gain seems a few orders of magnitude too small to even bother with it.


Part of the reasoning behind minification is that it theoretically makes JS parsing (and compilation) faster thanks to reduced whitespace and shorter identifiers. At this point you won't notice the impact of that on developer hardware, but it probably still matters on cheap phones and old PCs. The file size improvement really doesn't matter compared to that - you only download the file once* but it gets parsed and compiled on every page load**

* assuming your website doesn't get instantly evicted from cache.

** modern browsers have mechanisms for caching the parsed/compiled code for a given JS file hash, I think, but I don't know how it works


I doubt, that parsing code with longer identifiers really makes any noticeable difference even on old hardware. Parsing the code will most likely not be the bottleneck. Running an JS code is most likely going to take more time than parsing it. It is such a micro-optimization, that it is not worth even the development overhead of setting up a minimizer.


If you shrink a JS file by 50kb or more via minification (some JS libraries are a meg or more, so this is a realistic scenario - if not understating how much a minifier can do) that's a meaningful reduction in the # of bytes being processed. With how optimized and parallelized modern JS runtimes are it's probably only a few ms in total savings, but on older devices it could save more time. Historically it's been proven that small improvements to load time can have a measurable positive impact on a website.

Identifiers are often turned into atoms (which makes table lookup performance not really depend on how long they are) but for cases where identifiers are being looked up in dictionaries as raw strings, making them shorter could also improve runtime performance for your JS. I'm not sure how much this actually matters in practice though - I've seen dictionary key lookups show up in profiles here and there but it's usually not the main bottleneck.


CSS and JavaScript are partially ordered file formats, and there are some tricks one can do with that to boost compression ratios, with or without minification.

In a Java minifier ages ago, I modified it to suffix sort the constants pool and saw a little over 1% further size reduction. Given that the pool is half of the file, that’s >2% on the pool.

The reason for suffix sorting was that entities in bytecode, JS and CSS often have end of statement markers, so suffix sorting gets you slightly longer runs than prefix sorting.


You make decent points, but for this one:

> It makes the developer experience considerably shittier and the gain seems a few orders of magnitude too small to even bother with it.

Does it? It's very rare that I interact with the minimised code


Wouldn't that have to be due to your setup not minifying code during development? That's typical to a lot of web frameworks, if not nearly all of them.

I think the OP may be referring to situations where production needs to be debugged, in which case navigating production code usually becomes a massive pain in the rear.


Source maps generally solve most of the pain automatically. They aren't perfect, but production bugs don't mean you need to look at minified code often.

That said, if one can get away with not needing to do more work and still have a performing site then how little the difference is remains irrelevant.


> They aren't perfect,

That's a problem in its own, though. While I agree that source maps can and often do solve the minification problem/confusion, it can also introduce another "vector" for something to go wrong and make production debugging a pain. Like everything, it's about tradeoffs. I've worked on some apps where, yes, minification really did matter, but minification has also become a default even when it's questionable whether it's even called for. Whereas just shipping the code as-is is less likely to introduce a pain point when debugging, source maps are more likely to introduce them when the source map somehow doesn't work or, in rare circumstances, can't accurately represent the runtime code when transpilation is emulating the original code by doing something almost completely different somewhere down the chain.

Although this is an "it depends" thing, like everything else, I lean on the side of not minifying things in 2023 when both gzip is adequate and the time it takes to parse and run the main code is not the source of performance problems. But the last time I brought a similar topic, a bunch of HNers got mad because "starving people in Africa with flip phones and 3G tho", so maybe I'm still objectively wrong years later.


Extra step to worry about, need to always use things that support it, and it is solving self-inflicted problems


esp. if you upload your source-maps to your logging platform of choice. All the major ones support it that I've interacted with so far.


A few years ago I did some measurements of popular projects (Wikipedia, WordPress, default React, jQuery, few things like that) comparing minify, minify+gzip, and just gzip, and found found that most minification beyond "removing comments" gives very little benefit even for poor 2G connections.

Especially things like removing as much whitespace as possible or replacing true/false with !0/!1 just don't really save a lot of space. The only two I found to make a significant difference were removing comments, and shortening identifiers (and usually with most of the savings being in "removing comments", depending on the amount of comments and code style).


> Especially things like removing as much whitespace as possible or replacing true/false with !0/!1 just don't really save a lot of space.

At the scale of Wikipedia (~10 billion pageviews a month), even a single byte saving can mean terabytes worth of bandwidth saved.


I see this type of reasoning a lot and it feels like it should have its own named fallacy. A tiny percent savings on a huge cost is a moderately big number, yes.

The problem here is that you've switched from proportional thinking to arithmetic thinking in the middle of the thought.

In most cases, if you started with proportional thinking, you should stick to it.

Example of the fallacy: 0.1% of my salary goes to avocado toast, which might not seem that much, but over ten years, that's a thousand dollars!

The fallacy here lies in the fact that a thousand dollars is insignificant over 10 years, but the statement makes it feel significant by switching from proportional to absolute.

"The avocado toast fallacy"?


Yes, but sometimes switching to arithmetic/absolute thinking is the right thing to do.

For example, suppose an engineer who is paid $200k per year can spend a week to implement an optimization that reduces costs by 0.1%. If that is 0.1% of something that costs $50 million per year, then that's not a bad use of time.


> For example, suppose an engineer who is paid $200k per year can spend a week to implement an optimization that reduces costs by 0.1%. If that is 0.1% of something that costs $50 million per year, then that's not a bad use of time.

Is it not? I don't know. I don't run a multi-million dollar business. So this is an honest question. 0.1% of $50 million is a meagre $50000. If someone is running a business that is spending $50 million per year on something, are they really going to care about a small $50k saving per year?


Well, it's not worth it to bring this optimization to the attention of the CEO. But it definitely exceeds the engineer's weekly cost by a lot, so if it only takes a week, it's potentially worth it (modulo opportunity costs for other things to work on).

Spin it around: is it worth it for a sales person who makes $200k per year to spend a week to land a $50,000 per year contract? I think it potentially is, for the same reason, even if the organization as whole makes hundreds of millions of dollars a year.


google search engineering has entered the chat! https://www.google.com/search?q=google+41+shades+of+blue


Your example is starting with absolute arithmetic, and sticking with it. There is no change.


> A tiny percent savings on a huge cost is a moderately big number, yes.

The thing is, say it takes me an hour to shave off ten bytes in the final build. Value my time per hour at 200$, and the saving of that ten bytes over a year at Wikipedia scale at a thousand dollars, and even investing five times that amount into yak shaving would still be net profitable.


It happens every single time when someone uses big time unit for shock effect


> At the scale of Wikipedia (~10 billion pageviews a month), even a single byte saving can mean terabytes worth of bandwidth saved.

But does it really move the needle anywhere where it matters? If you are serving a million bytes in every response but you manage to trim a single byte from it, I don't think it is going to move the needle anywhere.

Sure it will save a terabyte when you have served a trillion responses. But in a trillion responses you are saving only one terabyte out of an exabyte of responses. That's not much!


Also keep in mind that for big web properties bandwidth is sold very differently than it is to normal consumers.

In many cases wikipedia probably isn't even paying for bandwidth due to peering agreement.

As far as performance goes, i suspect whether a single byte matters depends a lot more on if the extra byte results in the same number of packets or not.


You're right, at some scales it does add up in a meaningful way, and making an informed decision based on that is entirely reasonable. However, most people operate nowhere near that kind of scale, and don't make an informed decision about this.


Terabyte/month is around 3Mbit/ sec

Remember kids, if someone tries to use unusual time interval (like months for bandwidth-related stuff), they just want to make a small number look big


Is that correct? I get a larger number.

In either case, they said a single byte saved. Any amount of minification is going to save more than just a byte.

It’s also incredibly low effort to do so.


Are absolute numbers important in such a case? Percentages seem more appropriate.


So, why use two bytes for '!0' when just '1' works?


Its not strictly the same. '1' is a number whereas '!0' is a boolean.


I hate that language a bit more now


Both 1 and true are truthy, but not the same. That would work if the variable was only ever used in if(x), conditions, etc. Stringifying won't produce the same value either when passing values as well.


Because 1 !== true and 0 !== false in JavaScript:

  const index = "Hello HN".indexOf("H")
  console.log(index);
  >> 0
  index == false
  >> TRUE
  index === false
  >> FALSE


A lot of it reminds me of the olden days of Sinclair ZX81s with 1kB of memory, of which roughly half was free once you drew on the screen and defined a few variables.

We used to write things like "LET A=COS PI" instead of "LET A=1", or "SIN PI" instead of 0, because keywords were represented as a single byte and numeric constants as five bytes so two keywords represented a useful saving of three bytes.

Back then it made a huge difference but now that modern computers have hundreds or even thousands of kilobytes of RAM free it's a bit pointless.


> Back then it made a huge difference but now that modern computers have hundreds or even thousands of kilobytes of RAM free it's a bit pointless.

Jokes on you, now I'm running IntelliJ, DataGrip, Firefox, and Slack and thousands of kilobytes of RAM free is a rare sight to see.


You should try running visual studio 2022 and teams. On a medium sized project vs2022 will happily gobble 10+ gb of ram, and teams will take 1-2 more without supporting any meaningful scrollback. Even with ~500 tabs I can't hit 10gb ram in ff (thanks to tree style tabs to make it feasible to have so many tabs open).


they are really just doing us all a service, to remind us of the roots of computing, where every byte free was precious


You know what EMACS stands for?

Eight Meg And Constantly Swapping.


That joke was stale in 1994. You need to think of some new material.


Ooops, triggered one of the last dozen or so folk still using EMACS.


I'm not. It's just a trite and boring old "joke", made even more trite and boring on account of this thread not even being about Emacs.


how does minify affect the times to decompress and load the css I wonder?


Unless client combines decompress and CSS parsing, they would at least allocate memory for the whitespace before parsing and then skipping them.

It might be better to have slightly bigger payload on the wire to save time after decompress.


My money is on much less than the time it would take to send the extra bytes over the wire. But I suppose measurement is key :D


I remember finding a similar thing while working on minifying JavaScript. If you have code like

func("hello world");

func("hello world");

Then naively it seems that can be optimized to this:

let s = "hello world";

func(s);

func(s);

However once compressed, the second result is usually larger! It's basically because the second example adds the `let s =` part which is new unique content it has to store.

So if you want to minify JavaScript to a shorter version uncompressed, it's good to transform the first to the second. However if you want to minify JavaScript to the smallest compressed size, it's actually better to do the reverse transform and turn the second case in to the first! But then you get in to tradeoffs with parse time with a longer input, especially with very long strings - so I'm not sure any of today's minifiers actually do that. (Edit - turns out Closure Compiler does: https://github.com/google/closure-compiler/wiki/FAQ#closure-...)


Programming and compression are related, see Kolmogorov complexity [1]. It's then no surprise that when you do the "compression" yourself, it is almost like your are compressing twice, which often results in inflation because of the compression dictionary.

[1] https://en.wikipedia.org/wiki/Kolmogorov_complexity


It would be interesting to benchmark the CPU performance of a full page load and paint for each version. At the end of the day the language parser will see the decompressed text, so the minified version should reduce the time spent in that stage. However AFAIK a basic minify, that only compresses whitespace and identifiers, would only speed up the lexer, which I expect is the fastest part anyway. But reducing the number of actual CSS rules (as the author experiments with manually) may make a measurable difference

fwiw I don't bother minifying anything on my personal site. I realise that complex sites can have large enough payloads to make it worth it, especially for Javascript, but for the typical developer blog it's likely overengineering in the extreme. If I were to apply any postprocessor it would probably be a prettifier, for the HTML, since the template substitution results in mismatched indents. But then again anyone looking at the DOM is likely to be doing it in the dev tools, which already renders a projectional view, rather then the original source


Compression aware minifiers are definitely a thing.

Instead of trying to make the code as small as possible, they will try to make it as redundant as possible. For example by reusing names. For example "message="hello";print(message)" can be minified into "hello="hello";print(hello)", which may compress better than "a="hello";print(a)".


I like the idea, but just tested your specific example with https://facia.dev/tools/compress-decompress/gzip-compress/

and the second version compresses better, as my intuition already thought, without knowing how gzip works, because you still need a kind of identifier for repeated strings, and how much shorter than 1 byte can they be.


I ran some tests, and with zstd, brotli, and gzip, single-byte identifiers win on even slightly more complicated examples. If you have more than N identifiers (where N is the valid number of single-byte identifiers) then using short strings from the brotli dictionary seems to be at worst break-even for brotli.

Also, there seems to be a 40-byte minimum size on zstd; Making the original example significantly smaller (or larger with repeated patterns) all yielded 40 byte files. I had to change it to e.g. print(hello);foo(hello); bar(hello) to get any difference between using "x" and "hello" as an identifier.


OP's suggestion may work better on real life examples of source code (which would be longer and include a lot more different kinds of repeated strings)


For my "fun" projects I never minify the CSS or JS because I miss the days when you could View Source on a website and easily dig into how it works. If you're a NoScript user, go ahead and read the source and decide for yourself if you want to allow it. Much easier to maintain without all the moving parts of the JS build ecosystem as well.


I'll be starting a completely serious side project in a few days where I'm going to not minify anything (especially JS) so that user will be able to hit C-u and check for themselves nothing nefarious is going on privacy-wise. Also, no cookies at all, and the whole site will be just a bunch of static html files, two fonts, one css file, one small vanilla JS file and some images where appropriate. I hope to get load times well below one second (especially once the assets are cached).


FWIW if you ship source maps, then you can have the best of both worlds.


Do those work in View Source? Don't you have to load the site and inspect it in a debugger for source maps to kick in, which doesn't happen in the noscript scenario?


I don't think view source will load source maps but you can still open the f12 debugger and load source maps (even automatically) with noscript enabled. The debugger isn't treated as a normal webpage, rather part of the browser itself.


It's mostly just kind of funny that "View Source" diverged so much from the F12 debugger. For most of its history Spartan Edge had caught and fixed that mistake and View Source just dropped you into the F12 debugger. It's almost sad that Chromium Edge again shows you a different unrelated view.

Firefox has also seemed to wander back and forth between View Source just reuses the full Dev Tools and View Source is a disconnected view from Dev Tools for whatever reason. At the moment the big difference flag seems to be if you install/use Firefox Developer or not.

(Allegedly, one of the reasons is that some non-technical users have strange reasons that they View Source from time to time and they get upset if it opens Dev Tools because Dev Tools is just too confusing. Mac Safari has that interesting dance to ask it to opt in to showing Dev Tools. I seem to recall IE11 did something similar that if you ever opened Dev Tools for more than a few minutes it then took over for View Source, but otherwise used the older and simpler View Source from ancient IE history. I'm surprised that if the reason is "it confuses non-technical reasons" there isn't an opt-in flag in Chrome or Chromium Edge. The "Firefox Developer Edition" is a separate SKU is also an interesting way to approach that.)


But there aren’t any guarantees that the file the source map points to is actually the source, right?


Closure Compiler follows the same line of thinking:

https://github.com/google/closure-compiler/wiki/FAQ#closure-...


There is also an online tool / API for this, which can be quite helpful for non npm using vanilla.js workflows.

https://closure-compiler.appspot.com/

Should be set to:

  // @language_in UNSTABLE
  // @language_out ECMASCRIPT_NEXT
for it to work as expected.


You’re right, however for offline but non npm workflow, you can also use the Closure Compiler JAR, grabbed for instance from Maven: https://mvnrepository.com/artifact/com.google.javascript/clo...

   $ wget https://repo1.maven.org/maven2/com/google/javascript/closure-compiler/v20230502/closure-compiler-v20230502.jar
   […]
   $ java -jar closure-compiler-v20230502.jar 
   The compiler is waiting for input via stdin.
The Closure Compiler still is a Java project. It just gets primarily distributed via NPM these days.


If you control the deploying process and the web server (at least with Nginx), you can disable gzip for the small files (like the article has), and avoid on-the-fly compressing for the larger files.

    location /js/small {
      gzip off;
    }
    location /js/ {
      gzip on;
      # Use the .gz file if it exists
      gzip_static on;
    }


NB: there is also nginx `gzip_min_length` directive to avoid compressing small files https://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzi...


Cloudflare use to only perform compression level 4 (level 9 is most compressed), because they found while the file they served was Bigger - the amount to time to compress/decompress was longer for Level 9 … resulting in perceived slower loading/speeds of end user.

https://blog.cloudflare.com/this-is-brotli-from-origin/#:~:t....


You're linking to an article about brotli. I don't think gzip is affected by it. If anything, its decompression should be faster given fewer bytes to process.

The usual reason to stick to gzip level 4 is that compression gets extremely slow at marginal savings.

I would expect that cloudflare caches compressed data so they pay the compression cost only once.


At cloudflare's scale I can imagine that for many websites, compressed data falls out of cache frequently and then the file has to get compressed again on next request. For example if you're hosting a website that contains a ton of different news articles or user-submitted stories that's probably tens of thousands of separate blobs of text, and each one might only get read a few dozen times a day. The odds are much lower that the individual content is going to be in cache on the particular edge node that handles the request.

The CDN is still useful in this case of course, both for ddos mitigation and because it can cache the common stuff like your website's stylesheets.


My naive guess to what's going on is gzip here is set on tame settings, such that when it sees the same codeblock twice it shrinks it in a more space-efficient less cpu-efficient way. When it sees it just once, it doesn't.

However it happens, I think if you're being real, it would take A LOT of these micro-optimizations multiplied by A LOT of cache misses to add up to anything important.

What I got out of this is, unless you're looking at the specific bytes in the network tab every time you edit your code, just minify/gzip and forget about it.


This minify/gzip size effect is a well known quirk to developers of javascript minifiers. The minifier's symbol mangling algorithm often has a more pronounced effect than does advanced AST optimization.

This website has real life data on the matter for popular libraries:

* https://github.com/privatenumber/minification-benchmarks

Compare the trophies indicating smallest size for Minified versus Minzipped (gzip). Generally the smallest minified size yields the smallest minified+gzip size, but there are some notable anomolies outside the range of statistical noise. It is not practical for a javascript minifier to take a compression algorithm into account - it would blow up the minify timings exponentially.


I made a tiny jQuery library replacement a while back and I found similar things to the author; you should not be measuring the raw text representation, but the final gzip size, every time you make an optimization.

I go through a list of examples that I found where it does or it does not matter whether you "optimize" it for size or not. I also theorize (I do not have enough knowledge to verify my claims though) why this is happening:

https://medium.com/@fpresencia/understanding-gzip-size-836c7...


Now consider all the time, developer nerves, and compute (installation of dependencies and processing files), missed learning opportunity (people wanting to know how things work and looking at the source of a website), that have been wasted in setting up "minifiers" unnecessarily in frontend projects, "because it will be smaller" or "because I've someone else recommending it".


There are better ways to use media queries.

In your CSS use a variable for the property that changes, and put the fallback in it too.

In that way the next person to work on the code can see what changes.

In your media queries only have CSS variables, as locally scoped as possible, and name them after the property with -- at the front.

On this way it is self evident what the media queries are about.

For example:

    p { color: var(--color, black); }

    @query {
        p { color: chocolate; }
    }
And have the queries after the CSS for legibility.

By writing CSS this way, your source will be compact and easy to maintain.

Because of this you won't need to mangle it with CSS mangling tools or bother with CSS minifiers. Your production time will speed up immensely.

Also, style the elements, then add classes and IDs if you have to. Stay away from div's, use all the semantic elements with CSS grid and you will get much better results.


I’m very late to the party, but I wrote about this back in early 2016! https://csswizardry.com/2016/02/mixins-better-for-performanc...


I wonder if we're at a point yet where gzip's slowness is becoming a bottleneck. I wish zstd support in the browser was prioritized a bit higher, brotli does have similar/ better compression due to the dictionary but speed is also important.


Slowness on compression or decompression? The compression end, which is slower, can be plausibility mitigated with analogues of gzip_static.

And as the brotli people will insist on telling you, it's good even without that dictionary: https://news.ycombinator.com/item?id=27163981


Zstandard Compression and the application/zstd Media Type https://datatracker.ietf.org/doc/html/rfc8478


To really exploit zstd you'd want all major browser vendors to agree on some standard, updateable dictionaries for HTML/CSS/etc.


Good point. Has anyone got a simple explanation on how zip programs work, and what types of text they optimised best, and how to optimise for this?


This is a reasonable explanation of DEFLATE, which the underlying thing being talked about in the article: https://www.zlib.net/feldspar.html


Ah, the end-to-end principle in action. :)

https://en.wikipedia.org/wiki/End-to-end_principle


Is anyone compressing brotli these days? I switched from gzip to brotli quite easy in my Rails app.


I use it in a symfony project, and I compress assets beforehand in my deployment pipeline into static br suffixed assets. Then Apache is configured to serve these files if present instead of compressing on the fly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: