Author Topic: Reuters World News  (Read 6975 times)

0 Members and 1 Guest are viewing this topic.

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Reuters World News
« on: May 20, 2014, 12:39:34 PM »
In this command group, I scrape the Reuters World News feed for all the headlines.

"What's the news from Reuters" reads the headlines for the first few stories aloud, but it stores the links for all the current world news headlines in the reuters.xml payload xml file.

"Tell me more about headline number {x}" opens the web page for the article of your choosing, and reads that story aloud. It currently reads the whole article, but I've included instructions in the command for how you can easily change how much it reads to you.

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.2.2.2-->
<commandGroup open="True" name="reuters" enabled="True" prefix="" priority="0" requiredProcess="" description="">
  <command id="1162" name="Reuters news" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="{Match.{i}.1} is the headline (where {i} is 1st, 2nd, 3rd ... headlines)&#xD;&#xA;{Match.{i}.2} is the URL linking to each full story.&#xD;&#xA;&#xD;&#xA;There will be more matches stored in the payload xml file than the 4 announced in the command.">
    <action>
      <cmdType>Scrape</cmdType>
      <params>
        <param>http://feeds.reuters.com/reuters/worldNews</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegExSingle</cmdType>
      <params>
        <param>&lt;item&gt;.*?&lt;title&gt;(.*?)&lt;.*?description.*?&lt;link&gt;(.*?)&lt;</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>PayloadXML.Clear</cmdType>
      <params>
        <param>reuters.xml</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>PayloadXML.AddPair</cmdType>
      <params>
        <param>reuters.xml</param>
        <param>{Match.{i}.2}</param>
        <param>{i}</param>
      </params>
      <cmdRepeat>{#M}</cmdRepeat>
    </action>
    <action>
      <cmdType>OSD.ShowText</cmdType>
      <params>
        <param>Today's Reuter's headlines ({#M} total):</param>
        <param>10000</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>OSD.AddText</cmdType>
      <params>
        <param>{i}. {Match.{i}.1}.</param>
      </params>
      <cmdRepeat>4</cmdRepeat>
    </action>
    <action>
      <cmdType>TTS.SpeakSync</cmdType>
      <params>
        <param>Here are the top 4 of {#M} Reuter's headlines</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>TTS.Speak</cmdType>
      <params>
        <param>{i}. {Match.{i}.1}.</param>
      </params>
      <cmdRepeat>4</cmdRepeat>
    </action>
    <phrase>What's the news from Reuters</phrase>
  </command>
  <command id="1168" name="More from Reuters" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="Shows the first line of the story. Reads out the whole story.&#xD;&#xA;&#xD;&#xA;If you want it to only read the first few sentences, you can change {#M} to a specific number. So, if you use 2 instead of {#M} then it will read the first 3 lines, because the first sentence is read in the separate, TTS.Speak - {Match.1} action. (Or, you can create a separate &quot;shut up&quot; command with a TTS.Stop action to stop the TTS in mid-read. This would allow you to hear as much or as little as you wanted.)">
    <action>
      <cmdType>Launch.OpenURL</cmdType>
      <params>
        <param>{1}</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Scrape</cmdType>
      <params>
        <param>{1}</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegExSingle</cmdType>
      <params>
        <param>="description"\scontent="(.*?)"</param>
        <param><![CDATA[ ]]></param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>TTS.Speak</cmdType>
      <params>
        <param>{Match.1}</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>OSD.ShowText</cmdType>
      <params>
        <param>{Match.1}</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>Results.RegExSingle</cmdType>
      <params>
        <param>&lt;.*?midArticle_\d"&gt;&lt;/span&gt;&lt;p&gt;(.*?)&lt;</param>
      </params>
      <cmdRepeat>1</cmdRepeat>
    </action>
    <action>
      <cmdType>TTS.Speak</cmdType>
      <params>
        <param>{Match.{i}}</param>
      </params>
      <cmdRepeat>{#M}</cmdRepeat>
    </action>
    <phrase>Tell me more about headline number</phrase>
    <payloadFromXML phraseOnly="False" use2partPhrase="False" phraseConnector="by" Phrase2wildcard="anyone" optional="False">reuters.xml</payloadFromXML>
  </command>
</commandGroup>

Note: when you copy and paste this command group into your tree, the reuters.xml file will be highlighted in red. Issuing the "What's the news from Reuters" once should generate the file for you.

[Edit: Reuters site changed their RSS feed. Commands above updated 17-Aug-2016 with a new regular expression pattern.]
« Last Edit: August 17, 2016, 08:34:33 AM by nime5ter »
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

keithj69

  • $upporter
  • Sr. Member
  • *****
  • Posts: 113
  • Karma: 7
    • View Profile
Re: Reuters World News
« Reply #1 on: May 20, 2014, 06:22:46 PM »
very cool.

sirs2k

  • Jr. Member
  • **
  • Posts: 24
  • Karma: 0
    • View Profile
Re: Reuters World News
« Reply #2 on: May 21, 2014, 12:49:25 PM »
Very cool indeed.
Unfortunately, it's also reading the whole feed link "slash slash feed dot blah blah" :)
I can see that the links are getting recorded in the feeds in the XML file so I guess they will be read as it's a part of it.
Am I doing something wrong?

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Reuters World News
« Reply #3 on: May 21, 2014, 01:05:32 PM »
If you're not getting the news properly read to you, then yes, I guess so. :)

I would need actual info about your setup and what you've tried.

What version of VC are you using? This will only work with a version 2 release, because it relies on nested matching/"enhanced regex". http://voxcommando.com/forum/index.php?topic=1446.0
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

sirs2k

  • Jr. Member
  • **
  • Posts: 24
  • Karma: 0
    • View Profile
Re: Reuters World News
« Reply #4 on: May 21, 2014, 01:13:14 PM »
That clears things up, I'm using V1   :)

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Reuters World News
« Reply #5 on: May 21, 2014, 01:30:37 PM »
Generally, if you look at posted command xml, you can see what version was used to create it.

VC1 users should always keep in mind that the ability to import VC2 command xml doesn't guarantee it will work, since tons of new features have been added in VC2 and will continue to be added over the course of its development.

http://voxcommando.com/forum/index.php?topic=721.msg12795#msg12795

So, if imported commands aren't working, that should be the first guess. To eliminate compatibility as the culprit, you could always install the trial version of VC2 as a second instance of VC, just for command testing. The last stable version (1.933) is available on the Downloads page. http://voxcommando.com/downloads.asp

That said, my command xml above could be adapted to work in VC1 with some re-working of the regex.
« Last Edit: May 21, 2014, 01:33:23 PM by nime5ter »
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

sirs2k

  • Jr. Member
  • **
  • Posts: 24
  • Karma: 0
    • View Profile
Re: Reuters World News
« Reply #6 on: May 21, 2014, 02:27:51 PM »
Aha, now I can see how it's working great under V2, I'll be upgrading soon  :D

Haddood

  • $upporter
  • Hero Member
  • *****
  • Posts: 688
  • Karma: 22
    • View Profile
Re: Reuters World News
« Reply #7 on: June 03, 2014, 11:00:39 PM »
I have been following the code in this example to make a command that gets various news from various CBC feeds...
however it seems there is something wrong with RegEx as only the first item is collected ...

commands take tow payloads, the first is news type and the second is the rss feed url. they are passed from another command that reads news type and rss url from a xml file. (command can be tested with Top Stories and http://rss.cbc.ca/lineup/topstories.xml). for full list of available feeds: http://www.cbc.ca/rss/

Any help appreciated

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 1.9.5.4-->
<command id="1221" name="CBC news" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="{Match.{i}.1} is the headline (where {i} is 1st, 2nd, 3rd ... headlines)&#xD;&#xA;{Match.{i}.2} is the URL linking to each full story.&#xD;&#xA;{Match.{i}.3} is the description of each story. You could choose to read this aloud instead of {Match.{i}.1} in the TTS.Speak line, if you want a bit more detail.&#xD;&#xA;&#xD;&#xA;There will be many more matches stored in the payload xml file than the 4 announced in the command. In the other commands, you can ask to see all headlines and ask to go to specific stories.">
  <action>
    <cmdType>Scrape</cmdType>
    <params>
      <param>{2}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegExReplace</cmdType>
    <params>
      <param>\n</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegExReplace</cmdType>
    <params>
      <param>\r</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegEx</cmdType>
    <params>
      <param>item.*&lt;title&gt;&lt;\!\[CDATA\[(.*?)\]\]&gt;.*&lt;link&gt;(.*?)&lt;/link&gt;.*&lt;p&gt;(.*?)&lt;/p&gt;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>PayloadXML.Clear</cmdType>
    <params>
      <param>payloads\news.CBC.xml</param>
    </params>
    <cmdRepeat>0</cmdRepeat>
  </action>
  <action>
    <cmdType>PayloadXML.AddPair</cmdType>
    <params>
      <param>payloads\news.CBC.xml</param>
      <param>{Match.{i}.3}</param>
      <param>{i}</param>
    </params>
    <cmdRepeat>0</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>Today's CBC {1} headlines</param>
      <param>10000</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.AddText</cmdType>
    <params>
      <param>{i}. {Match.{i}.1}.</param>
    </params>
    <cmdRepeat>4</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.AddText</cmdType>
    <params>
      <param>{i}. {Match.{i}.2}</param>
    </params>
    <cmdRepeat>4</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.AddText</cmdType>
    <params>
      <param>{i}. {Match.{i}.3}</param>
    </params>
    <cmdRepeat>4</cmdRepeat>
  </action>
  <action>
    <cmdType>TTS.SpeakSync</cmdType>
    <params>
      <param>Today's Reuter's headlines</param>
    </params>
    <cmdRepeat>0</cmdRepeat>
  </action>
  <action>
    <cmdType>TTS.Speak</cmdType>
    <params>
      <param>{i}. {Match.{i}.1}.</param>
    </params>
    <cmdRepeat>0</cmdRepeat>
  </action>
  <event>News.CBC</event>
</command>
When Voice command gets tough, use hand gestures

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Reuters World News
« Reply #8 on: June 04, 2014, 08:48:08 AM »
A few things that may help:

1. In this case, I'd recommend using the action Results.RegExSingle rather than Result.RegEx (http://voxcommando.com/mediawiki/index.php?title=Actions#RegExSingle).

That way, you don't need to try to remove line feed characters etc. to try to match multi-line strings.

2. Watch out for regex "greediness". (see lower down on this page: http://www.regular-expressions.info/repeat.html).

It seems that you're often trying to use .* rather than .*?

The question mark is needed, otherwise it will continue to hunt for the last instance of the character that comes after .* in the entire string being scraped, rather than stopping at the nearest neighbouring instance of that character.

The following version of your command reads the first 4 CBC headlines:

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 1.9.5.4-->
<command id="1240" name="CBC news (headlines only)" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Scrape</cmdType>
    <params>
      <param>{2}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegExSingle</cmdType>
    <params>
      <param>item.*?&lt;title&gt;&lt;.*?DATA.(.*?)\]</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>Today's CBC {1} headlines:</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.AddText</cmdType>
    <params>
      <param>{i}. {Match.{i}}</param>
    </params>
    <cmdRepeat>4</cmdRepeat>
  </action>
  <event>News.CBC</event>
</command>

In your command, if the original regular expression you used had worked, the OSD message would have been:

1. first headline
2. 2nd headline
3. 3rd headline
4. 4th headline
1. first article link
2. 2nd article link
3. 3rd article link
4. 4th article link
1. first story nugget
2. 2nd story nugget
3. 3rd story nugget
4. 4th story nugget

Is that what you were aiming for? If you can clarify, I can then help with achieving the final goal.

3. ... One other thought, which may not be appropriate for how you're using VC which seems to rely a lot on event-triggered commands, but:

You are passing 2 payloads to this command -- the link you want to scrape, and the topic. If you were to issue a voice command to call this command directly ("What are today's {top stories}"), you could use a payload xml file of the form:
 
Code: [Select]
value = link ; phrase = feed name
In that case, your command would use {1} (link to scrape) and {PF.1} (name of feed) instead of {2} and {1}. [http://voxcommando.com/mediawiki/index.php?title=Variables#Payloads]

I realize you may already be familiar with the friendly payload variable, but just in case you're not I figured I should mention it. :)

« Last Edit: June 04, 2014, 09:26:29 AM by nime5ter »
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Reuters World News
« Reply #9 on: June 04, 2014, 09:22:54 AM »
Maybe this example can more clearly demonstrate greedy vs. non-greedy regular expressions:

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 1.9.5.4-->
<command id="1165" name="Greedy versus not greedy regex" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Results.SetLastResult</cmdType>
    <params>
      <param>&gt;this is a &lt;sentence&lt;with extra &lt; in it.&lt;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegEx</cmdType>
    <params>
      <param>&gt;(.*)&lt;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>Greedy: {Match.1}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegEx</cmdType>
    <params>
      <param>&gt;(.*?)&lt;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.AddText</cmdType>
    <params>
      <param>Not greedy: {Match.1}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <phrase>Test regular expressions</phrase>
</command>
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

Haddood

  • $upporter
  • Hero Member
  • *****
  • Posts: 688
  • Karma: 22
    • View Profile
Re: Reuters World News
« Reply #10 on: June 04, 2014, 01:23:59 PM »
Maybe this example can more clearly demonstrate greedy vs. non-greedy regular expressions:


Though took some figuring out but indeed it did. here is the final RegEx that works (missing lazy expression are in red):
item.*?<title><\!\[CDATA\[(.+?)\]\]>.*?<link>(.+?)<\/link>.*?<p>(.+?)</p>

the lesson well learned thank you nime5ter


3. ... One other thought, which may not be appropriate for how you're using VC which seems to rely a lot on event-triggered commands, but:

You are passing 2 payloads to this command -- the link you want to scrape, and the topic. If you were to issue a voice command to call this command directly ("What are today's {top stories}"), you could use a payload xml file of the form:
 
Code: [Select]
value = link ; phrase = feed name
In that case, your command would use {1} (link to scrape) and {PF.1} (name of feed) instead of {2} and {1}. [http://voxcommando.com/mediawiki/index.php?title=Variables#Payloads]

I realize you may already be familiar with the friendly payload variable, but just in case you're not I figured I should mention it. :)



that is exactly what I am doing, (the OSD.show text was just for debugging)... I have a command that passes {1} (link to scrape) and {PF.1} ... it is a branching command, so I can say world news from Reuters, or top stories from CBC. the rest is identical to your Reuters command ... store the results in xml and "more about item #1 from CBC" will give more about the head line ... 

now time to study RegEx Single. Thanks again for your valuable help
When Voice command gets tough, use hand gestures

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Reuters World News
« Reply #11 on: June 04, 2014, 08:48:00 PM »
Great. Glad it made sense.
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Reuters World News
« Reply #12 on: August 17, 2016, 08:29:28 AM »
Thanks @deco123411 for alerting me to the fact that the Reuters RSS feed has changed, breaking the regular expression used in the first command.

I have updated command XML in the first post.

http://voxcommando.com/forum/index.php?topic=1576.0
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)