A lot depends on how spatial frequencies are distributed in the image. Natural scenes have a spectral density very close to a power law, i.e. they have a quasi-fractal nature where if you zoom in, you see an image that has again a similar distribution of spacial frequencies.
By contrast, a facade of a building might have a few very low frequencies and one very high frequency. A vast expanse of the frame might be occupied by this very regular pattern. Applying too much sharpening at that scale before downsizing will therefore increase the likelihood of getting a weird interference pattern.
Another thought: the anti-aliasing filters in some cameras are weaker than they probably should be. When using a very highly resolving lens, the image appears sharper than from a comparable sensor with strong AA filter not because there is more information, but because the contrast at very high spatial frequencies is high. Actually you are just baking false detail into the capture. This might explain some of the crunchiness. Moderately blurring the image prior to down-sampling might improve the visual appearance.