I think part of the solution may have to deal with the crossover freq. that you use.

TO ME (YMMV) the most important factor of imaging is having the male/female vocals sound as if they are originating from the same point. I do not like the idea of the higher octave female voice sound disjoined from the rest of teh voices. Therefore, I tend to favor the tweet and mid close together.

But if your tweet is x-over'ed at a high freq. with your mid responsible for ALL the vocal frequencies, then I do not see why you couldn't group the midrange and midbass.

That's just my inexperienced $0.02
