Which of the examples below are more likely to ease the speech recognition in VC?
The most important thing is that if you have all other words equal in two different commands, then the words that are different must not sound similar to each other. And of course the part of the phase that is unique must not be optional.
I'm a bit confused because you gave us three commands to compare but one was an album and two were movies. So I will ignore the album and just look at the two methods for movie:
It seems you are really asking if it is better to put complete phrases all into one phrase in the tree, or to break them up. The answer is that the second method (break them up) is usually better for a number of reasons.
1) Easier to edit
2) Less likely to run out of space in the tree field
3) Probably uses less memory
4) Assuming the total possible phrases you can say comes out identically for both commands, then the recognition engine will probably get equally good results using either method.
Sometimes you can't break phrases up this way because you would need to change your sentence structure a lot to maintain proper grammar. In this case you can use the single phrase method with lots of aliases or you can create two commands. Just don't create multiple commands if you are using large payloads.
Now lets compare the first and last commands with each other: ("What album" / "What movie #2")
Using these two commands together makes sense. They are identical to each other except for the second phrase. The most important thing here is that none of the words(album, C D, disc, recording) should sound too similar to any of the words (movie, film, video).
And of course this rule applies to any other commands in your tree as well.
Sometimes you will have words that the engine cannot differentiate very well depending on your accent and the quality of your recording. For example,
sometimes the words "on" and "off" are too similar. If that is the case then "turn the TV on" and "turn the TV off" may not work for you. It won't matter how your tree looks, if your end phrases end up allowing for you to say either "turn the TV on" or "turn the TV off" then it will only matter if the engine can differentiate between the sound of "on" and the sound of "off".
One solution is to use different words like: "enable/disable" "activate/deactivate" "fire up/kill" etc. instead of "on/off"
Or you can use completely different phrasing for each command: "Turn the TV on" / "Turn off the TV". This will be easy for the engine to differentiate but it will be hard for the user to remember and if they say "Turn the TV off" then the computer is almost certain to think they said "Turn the TV on".
One note about the order of phrases and payloads. I understand that sentence structure will be different from one language to another but there may be a situation where you will want to break your language rules a bit.
For example (the best example really) the command "Play song X" where X is a payload XML with a very large number of items. In this case, I think it is really best if you don't reverse this. It is OK to say "song play X" but I think it would cause problems to say "X song play" or "X play song". I think it is easier for the engine if there are phrases before the payloadXML that give it an idea of what the general command will be. This will be especially true if you have a lot of songs, and if you are using subset matching in your payloadXML. I could be wrong about this, but even if it doesn't affect the end accuracy it will cause your "guessed text" to be all over the place while you are speaking.