Training is making the model (or rather going from something random and useless ...

Training is making the model (or rather going from something random and useless to something well calibrated and useful). Inference is using it to make a prediction.

This is saying that you don’t need the entire model to make good predictions for specific subsets of tasks. You can literally remove a large part of the model and it will do fine. Which is not very controversial. The model, after being trained, is a large collection of interacting nodes. When this is talking about dropping chunks of the model it means dropping nodes after training to make predictions. The advantage primarily being that smaller models are cheaper and faster to run or modify with further training.

You know that meme about how you only use 10% of your brain at a time? Well, yeah, but the idiot movies that suggest using 100% of your brain would make you impossibly smarter are not correct. 90% of your brain just isn’t relevant. More brain / model is not better than the relevant subset alone.

The important question to be asking is whether you can remove large chunks of the model without hurting its ability to generally to do well on whatever you ask it.

As a very crude example, imagine you trained a simple model to predict rainfall using a weather monitor and the number of farts you did last week. The model will probably learn that the monitor is a useful and the farts are irrelevant. If this were as simple as a linear regression, you could just remove the farts coefficient from the equation and the model would come out to the same outcomes. Neural nets are not so easily observed but it’s still just dropping the irrelevant bits to whatever you’re trying to do.