In opposition to what you say, Bill's own investigation seems more trustworthy to him than the DXO-Data!
DXO give engineering DR and SNR for 18% gray. From these to get the "photographing DR" (using threshold SNR = 20) I believe some kind of modeling / interpolation may have to be used, so presenting PDR from DXO data is an approximation. That doesn't invalidate the data DXO presented by DxO themselves (they're just different measures).
I believe Bill Claff's site marks those cameras which are estimated from images that were not shot in a controlled way using a symbol such as (e) or (p). Initially the D850 had one of those letters behind it in the legend, but then those were replaced by new data which is apparently measured using a controlled procedure and that symbol indicating estimated or preliminary data was dropped. However, I don't know
how controlled the procedure is; I would think a company such as DXO which do all the measurements and analysis in house can control the process to be more precisely the same for all cameras. But any measurement procedures over long term can be subject to human error and variability in conditions. I do believe each site make their best effort to provide valuable data.
Also it is good to remember that there are differences which are not measured by these procedures. For example the D800 strongly clips blacks at high ISO, leading to problems in its use for astrophotography (where averaging is commonly used to image faint objects). In the D810 this issue was greatly alleviated. This is discussed in Jim Kasson's blog in depth, it is worth reading. I remember also seeing comparisons that show that the D750 handles long exposures at high ISO really well, but I don't remember which web site it was. Some other cameras are reported to produce increased noise after being used for longer exposures.
I would not recommend using a single source of sensor data to make camera-buying decisions. Study multiple sources of information, and consider also the feature set, not just numerical data. For most people the feature set is probably more important than small differences in sensor image quality. And for those special applications where the differences in sensors do matter, one would be best off to test the cameras in the application itself to be sure.