I think part of the solution may have to deal with the crossover freq. that you use.
TO ME (YMMV) the most important factor of imaging is having the male/female vocals sound as if they are originating from the same point. I do not like the idea of the higher octave female voice sound disjoined from the rest of teh voices. Therefore, I tend to favor the tweet and mid close together.
But if your tweet is x-over'ed at a high freq. with your mid responsible for ALL the vocal frequencies, then I do not see why you couldn't group the midrange and midbass.
well, imaging is all about pathlength. Bass is more omnipresent than midrange and much more than treble, so I would put mids and tweets in kicks to keep the pathlengths the closest (time alignment helps here). Mids in doors can sound ok, but generally you want them facing you cause you want to be on-axis with it. That's why a good home audio setup is worlds better than a good car setup.
I would put the midrange up top in the pillars if possible in order to provide a higher ambiant stage. In terms of image focus I don't think it matters, although all those reflections up top may diffuse your imaging a bit.
Remember, an integral part of stereo imaging is that the frequency response and amplitude from left to right is equal.... an often very overlooked fact.