This definition of "sensor-wide" dynamic range is the part that makes no sense to me. Imagine for a moment that we are talking about film (chemical rather than electrical - but same idea). Cut a piece of tri-x in half and expose it to a step chart. All the pieces will record the same gradient from black to white. You won't magically get more dynamic range by increasing the size of the film. If that were the case 8x10 would have a tremendous dynamic range advantage as compared to 35mm. Sadly, it doesn't. The same is true of a silicon based sensor.
It appears that this definition is just part of a circular exercise where a smaller sensor has less dynamic range because the definition of dynamic range for sensors is based on their size, not in the range of values that they can capture (which in the case of selecting a dx sized area of an fx sensor is identical)
I don't understand this idea of 'circular exercise'. The definition of dynamic range at the sensor level that John and I gave, like the PDR definition by Bill Claff, is one that lends itself to comparison of images at a standardized output size. I think that this is the only way to meaningfully compare images of the same scene, and I think the definition is useful for that. Others prefer to think of secondary magnification.
It is not about 'better' or 'worse' definitions. Each definition has a specific, precisely delineated purpose.
If you use the per-pixel or per-unit-area DR, you will find that it does not correspond to what you can see in a standardized output size in terms of noise.
That is why sensor-level DR makes sense as a concept, and why Bill Claff uses PDR as a y-axis, as opposed to engineering dynamic range.
The definition is what it is. The non-trivial part is that if you accept the definition, you can do the little calculation that John and I did in order to estimate sensor-level DR. You get certain predictions. Then you can test those by looking at actual images and running statistical analyses on them. This part is definitely non-circular.
Using a different definition, you might end up with the same predictions, but using a slightly different calculation.
I said before that Bill Claff's graphs don't make sense if you don't understand the definitions. Perhaps too many people draw conclusions from the graph without understanding the definitions, but this is their own fault, not Bill Claff's, because he gives all required information.
The definition ensures that an iPhone 7 ends up lower on the scale than a D800, even though the individual photosites of the iPhone are likely as efficient as the ones in the D800, or even more efficient. And we probably agree that a given output from an iPhone does not look the same or better than from a D800.
It's all about how to present the data.
If the axis were engineering DR, all sensors from the same generation would be very close. One would have to separately calculate the effect on a standardized output by invoking secondary magnification, instead of being able to read off the difference in stops directly from the graph. Both lead to the same conclusions.
About your film example: If you print your gradients from the smaller piece of film, the grain of the film will be more strongly magnified, giving you less certainties about the dark tones of the gradient and blackness than if you print from a bigger piece of film. Again, looking at a standardized output size. This has been described by Bjørn as 'different secondary magnification', but both concepts are 'equivalent' in terms of predictive power, if you allow me to use a pun.
The analogy with film is not strict because a piece of film, differently from a digital sensor, does not have a full-well capacity, so dynamic range is not a precise number as for a sensor. I do not know the technical definition of dynamic range of film, but I guess that the upper limit involves some kind of cutoff where the density does not change meaningfully by exposing more.