the time Vox spends on this is all just in analysing the payload xml files and creating all the possible sub-phrases and then binding them to the data needed to trigger the command with the correct payload. It does not matter where the songs are or if the disk is fragmented. Actually, as long as the xml file can be read by VC, the audio files don't even need to exist.
22 thousand songs is beginning to push the practical limits, and you are using two commands that each create giant commands from all these songs. The play song X by artist Y is particularly taxing because we can end up with some extremely long phrases and then hundreds of possible sub-phrases for each song.
My advice to you is to get rid of the "play song by artist" command completely. For my own purposes I rarely ask for a song by name to begin with. I mostly ask for playlists, artists, or albums, or genres. If you ask for a song and it plays the wrong artist, you can always just advance to the next song and then the next until you get the right one.
That said, there are other ways to optimize. First of all, if you want you can get rid of the subphrases completely, but then you will need to ask for the song by asking for the whole name of the song, which can often be difficult to remember.
Or if you want to keep subphrases you can go into advanced settings (VC options) and increase the minimum sub phrase length from 4 to something like 10. This is the minimum length of a subphrases in characters so if you set it to 5 you won't be able to ask for a song by just saying "love" (4 characters) unless that is the complete name of the song.
You can also reduce the maximum subphrase length.
You should probably also make sure that you songs are cleanly tagged. Most people, especially the ones with 22 thousand songs, actually have very messy libraries with poor tagging and sometimes very long song and artist tags. the longer the tag, the more strain it puts on VC.
And finally, if you can somehow filter out all the extra stuff that you know you are never going to ask for by name (classical music perhaps) this can also help. There are tools to do this with XBMC and MediaMonkey but not JRiver.
Another option is to use dictation payloads instead of payloadXML. This will load almost instantly, but since it isn't aware of your library it will not be able to match intelligently to what you've got.