I don't think so. Pruning a large model and training a smaller model isn't the s...

cubefox · on Dec 16, 2023

Do you expect a model which was overtrained (relative to the Chinchilla law) to be no more affected by pruning than a model of the same size that wasn't overtrained?

danielmarkbruce · on Dec 16, 2023

Can you reformulate this question? It's hard to know what you mean when you say "no more affected". How are you defining "more" ?

cubefox · on Dec 16, 2023

I mean stronger impact on loss or benchmark results.

danielmarkbruce · on Dec 16, 2023

I mean, in some relative (like 10%) or some absolute amount? I think I'd expect the "more trained" model to drop performance by less (as a %, which is hard to define here) but more in absolute sense. Which, is basically impossible to measure but even if it was measurable..I don't feel confident about that prediction, it's speculation.