Here are some possible reasons. There are differing theories, thoughts; apparently, some people honestly can not hear the difference. But, some can.
The purpose and goal of semiconductors was always primarily cheapness (to make and operate) and also high-frequency operation — sound quality was never a goal or purpose.
If but at a microscopic level, the semiconductor atomic lattice inherently imparts a ‘digitizing’ quality having discrete steps of valence charge (theoretically). The charges do not move smoothly through analog transitions but rather step or lurch into and out of comparatively rigid positions.
The narrow and shallow channels of infused impurity/dopant deposition inherently favor high-frequency operation.
What is the chief complaint of “transistorized” sound? Harsh and unnatural-sounding high-end. Is such a consequently expected match merely a total coincidence?
Moreover, any collision into supply rails is almost always made audibly far worse by the extremely high (and insurmountably time/phase-lagging) negative-feedback present to tame semiconductive amplification, notorious for its proclivity of collapse into uncontrolled oscillation.
From 21:30 into video
“Calvin Fuller … pioneered the use of diffusion…to impregnate a very narrow layer [of dopant] on the surface of silicon and germanium wafers … By making very narrow layers you [can] get the very high frequencies. The frequency [attainable] is roughly inversely proportional to the thickness of the base layer…a micron or two [microns 10^-6] thick…[was already possible] in 1955.” Atom thickness is generally thought to be 10^-10, four orders of magnitude less, and the above-all-else effort to attain toward this narrowness/shallowness has never abated.