The problem with the 'equivalence' thinking is that putting forward alternative approaches to reach a final result (be it print size or whatever) is masked by all the muddle of confused terminology. Will, for example, increasing the sheer number of pixels in the first step mitigate some of the problems caused later by large magnification? This again introduces another constraint, namely also the need for making the print not only fixed size but also with a fixed dpi setting of course (just to show how confused any line of thought ends up with so tongue in cheek is required). Will downscaling an FX sensor result in "increased DR" if the downscaling is done by binning photo sites? and so on. A maze of likely unusable roads can be followed for those having plenty of spare time upon their hands. The wiser ones would pick up the nearest camera and go shooting instead.
Yes, I think these are valid concerns, but unsurprisingly they all concern second-order (smaller) effects (I hope that I mentioned that we are dealing with an approximation, i.e. as pointed out by someone 'a model that is wrong but useful').
For instance, increasing the number of pixels of the sensor has a rather small effect. If the used lenses are sufficiently high-resolving, the higher number of pixels can reveal some additional fine details, but only up to the limits imposed by diffraction. Regarding noise, on a pixel-by-pixel level there will be more noise because each pixel records a smaller number of
pixels photons, but when viewed on the final print, unless we are dealing with very low resolution the additional noise will be largely invisible (printing is also a noisy process).
As said, the limitation of the model is that the number of pixels and other details of the sensor design are not included, because they are way too difficult to model with such a simple framework. However, the differences we see between formats that have vastly different image sensor areas, are well captured by the model. E.g. going between m4/3 and FX, or between a smartphone camera and FX, etc.
EDIT: To clarify the remark about photons: smaller pixels require a lower FWC if the base ISO is maintained because they will be struck by fewer photons for a certain exposure. If the read noise does not change, the DR of each pixel is reduced (lower FWC, same read noise ---> lower DR).