Author Topic: need help using regex after i scrape a localhost site  (Read 7436 times)

0 Members and 1 Guest are viewing this topic.

2exclusive

  • Contributor
  • ***
  • Posts: 54
  • Karma: 0
    • View Profile
need help using regex after i scrape a localhost site
« on: July 13, 2015, 04:58:52 PM »
need help using regex for the highlighted output in attachment after a scrape command. I would like vox to scrape a localhost site and then TTs the temperature that is showing on the site. see attached.

any help is appreciated.



thanks again

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #1 on: July 13, 2015, 05:40:31 PM »
This command should hopefully be the right regex pattern needed to find just the numbers "76" in the photo you shared,

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.5.1-->
<command id="1002" name="Regex Example" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Results.RegEx</cmdType>
    <params>
      <param>span2\scurrent.temp"&gt;(.*?)\D\sF&lt;/d</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
</command>
This will perform Regex on {lastresult} by default
 
I would recommend when looking for help with RegEx I would copy and paste the text when possible in stead of a picture because if I missed one character then this RegEx wont work


« Last Edit: July 13, 2015, 05:42:42 PM by PegLegTV »

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #2 on: July 13, 2015, 05:44:10 PM »
After looking at your photo closer I noticed that my first Regex Match was going to find other matches as well, I just update the command above to only match the line you highlighted

This is the text highlighted in 2exclusive's photo
Quote
<div Class="span2 current-temp">76º F</div>
let me know how it works for you
« Last Edit: July 13, 2015, 06:46:37 PM by PegLegTV »

2exclusive

  • Contributor
  • ***
  • Posts: 54
  • Karma: 0
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #3 on: July 13, 2015, 06:15:14 PM »
hmmm let me know if this screen shot makes sense, it still doesnt find any value... How do you check if the scrape function is even working?

btw thanks for your help with this

Kalle

  • $upporter
  • Hero Member
  • *****
  • Posts: 2319
  • Karma: 47
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #4 on: July 13, 2015, 06:20:22 PM »
You can use also a dot instead of "/s" which "eliminate" the white space character - but I suggest  - use the VC regex tool which will be a great help.


the "/d" in your regex action make no sense to me


If you wil get only the 76 - then you can use this in your regex action:
Code: [Select]
<div.Class="span2.current-temp">(.*?)\D+<

you can open your URL with the RegEx Tool by click on it in the LCB editor







« Last Edit: July 13, 2015, 06:36:43 PM by Kalle »
***********  get excited and make things  **********

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: need help using regex after i scrape a localhost site
« Reply #5 on: July 13, 2015, 06:40:01 PM »
@ 2exclusive: Please stop posting only images:  Paste actual text in a codebox for the html that you want to apply RegEx patterns to, as mentioned by Pegleg.  Also post commands using XML not images: http://voxcommando.com/mediawiki/index.php?title=XML_on_the_forum.  If you want to post a picture as well that's fine.

@ PegLeg: don't forget about spaces in your RegEx patterns!   VC ignores normal spaces and they need to be represented explicitly using \s or if you are feeling loosey goosey you can use a .

@ Kalle: \d is a digit.  Very useful. In this case PegLeg is using it to eliminate the degree sign and the F.  Spaces and digits and all other types of special characters use backslash \ (not forward slash / ).
« Last Edit: July 13, 2015, 06:44:01 PM by jitterjames »

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #6 on: July 13, 2015, 06:45:16 PM »
@ 2exclusive: Please stop posting only images:  Paste actual text in a codebox for the html that you want to apply RegEx patterns to, as mentioned by Pegleg.  Also post commands using XML not images: http://voxcommando.com/mediawiki/index.php?title=XML_on_the_forum.  If you want to post a picture as well that's fine.

@ PegLeg: don't forget about spaces!

@ Kalle: \d is a digit.  Very useful. In this case PegLeg is using it to eliminate the degree sign.  Spaces and digits and all other types of special characters use backslash \ (not forward slash / ).

@jitterjames: The command I posted in my first post has the regex that should work, the second post is simply the text that is posted in 2exclusive photo, I will change that to a quote instead of a code box, so it doesn't confuse anyone

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: need help using regex after i scrape a localhost site
« Reply #7 on: July 13, 2015, 06:48:17 PM »
Right you are! Sorry. I should have known.  :-[

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: need help using regex after i scrape a localhost site
« Reply #8 on: July 13, 2015, 06:53:48 PM »
actually though.  Assuming that I am reading the image correctly, here is the RegEx I would use.

Code: [Select]
current-temp">(\d+)

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #9 on: July 13, 2015, 07:07:05 PM »
hmmm let me know if this screen shot makes sense, it still doesnt find any value... How do you check if the scrape function is even working?

btw thanks for your help with this

in order to test if your scrape action is working you can look at the vc history window, if it is highlighted then it didn't work, if its working then you could add the output to your clipboard so you can paste it on the forum so we can take a closer look at the output and make sure the regex pattern is correct,

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.5.1-->
<command id="1002" name="checking scrape output" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Scrape</cmdType>
    <params>
      <param>the url you are trying to scrape</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>System.SetClipboardText</cmdType>
    <params>
      <param>{LastResult}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>{LastResult}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
</command>

in this command where it says "the url you are trying to scrape" change that to the url you are trying to scrape

this will copy {lastresult} to your clipboard and show an OSD of the {lastresult} so if your scrape action is working then it should show what is found in the scrape action
actually though.  Assuming that I am reading the image correctly, here is the RegEx I would use.

Code: [Select]
current-temp">(\d+)

@jitterjames, I think it needs to look like this
Code: [Select]
2\scurrent-temp">(\d+)otherwise it will also make "77" a match as well

text from 2exclusive's photo
Quote
<div Class="span2 current-temp">76º F</div>
<div Class="span3 current-temp">77º F</div>
<div Class="span2 current-temp">cool</div>



PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #10 on: July 13, 2015, 07:16:54 PM »
this is the new regex thanks to jitterjames,

if your scrape action is working then this regex should find the temperature you are after

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.5.1-->
<command id="1003" name="New regex action" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Results.RegEx</cmdType>
    <params>
      <param>2\scurrent-temp"&gt;(\d+)</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
</command>

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 2012
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: need help using regex after i scrape a localhost site
« Reply #11 on: July 13, 2015, 07:23:31 PM »
@jitterjames, I think it needs to look like this
Code: [Select]
2\scurrent-temp">(\d+)otherwise it will also make "77" a match as well

True, James's pattern is less exclusive. But since 2exc will need to call on a specific match anyhow for his TTS announcement (e.g. {Match.1}) that shouldn't really matter, should it?
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #12 on: July 13, 2015, 07:43:21 PM »
True, James's pattern is less exclusive. But since 2exc will need to call on a specific match anyhow for his TTS announcement (e.g. {Match.1}) that shouldn't really matter, should it?



no it shouldn't matter that much, I wasn't trying to step on any toes  :bigno ,
I only posted the change so it would be more specific to the information he was after, that was the reason I revised my first post, I thought maybe jitterjames had made the same goof I did by missing the second mention of a temperature

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: need help using regex after i scrape a localhost site
« Reply #13 on: July 13, 2015, 09:06:27 PM »
I think everybody's toes are Ok. :)

Hopefully there is something to be learned from the back and forth, for those who are still struggling with RegEx.

2exclusive

  • Contributor
  • ***
  • Posts: 54
  • Karma: 0
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #14 on: July 14, 2015, 01:10:38 PM »
everyone thanks for your help on this, I learned some regex troubleshooting from this thread thanks to all the replies.  So it looks like it wasn't able to find the temperature number because the method that regex is scraping with needs to be authenticated via a pin. I am using ecobee's api and you need to register all browsers with a pin that way it is able to communicate with the API. I registered the google chrome and IE browser with the api assuming the scrape method was using those browsers to scrape, but no luck.

 i am still unable to get the scrape method to register. By using the REGex tool i was able to see that it was requiring a registration. darn it ! looks like this isnt going to work unless there is a way to force regex tool to use IE or Chrome to scrape if this makes sense.

 below is what is showing when using regex tool tp scrape the localhost site: In order to get this to work i need to hit the complete link once i register the PIN with the API.

Code: [Select]
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>ecobee API Demo</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="description" content="">
    <meta name="author" content="">

    <!-- Le styles -->
    <style type="text/css">
      body {
        padding-top: 40px;
        padding-bottom: 40px;
        background-color: #f5f5f5;
      }

      .form-signin, .pin {
        max-width: 330px;
        padding: 20px;
        margin: 20px;
        background-color: #fff;
        border: 1px solid #e5e5e5;
        -webkit-border-radius: 5px;
           -moz-border-radius: 5px;
                border-radius: 5px;
        -webkit-box-shadow: 0 1px 2px rgba(0,0,0,.05);
           -moz-box-shadow: 0 1px 2px rgba(0,0,0,.05);
                box-shadow: 0 1px 2px rgba(0,0,0,.05);
      }
     
      .form-signin input[type="text"],
      .form-signin input[type="password"] {
        font-size: 16px;
        height: auto;
        margin-bottom: 15px;
        padding: 7px 9px;
        width:305px;
      }
      .btn {
        margin:0 auto;
      }
      .pin {
      text-align:center;
     
      }
      fieldset {
      text-align:center;
      }
    </style>
    <link rel='stylesheet' href='/css/bootstrap.min.css' />
    <link rel='stylesheet' href='/css/bootstrap-responsive.min.css' />
    <script src="http://code.jquery.com/jquery-latest.js"></script>
    <script type='text/javascript' src='/js/bootstrap.min.js' > </script>
  </head>

  <body>
  <div class="navbar navbar-inverse navbar-fixed-top">
      <div class="navbar-inner">
        <div class="container">
          <a class="btn btn-navbar" data-toggle="collapse" data-target=".nav-collapse">
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
            <span class="icon-bar"></span>
          </a>
          <a class="brand" href="/thermostats">Ecobee API Demo</a>
          <div class="nav-collapse collapse">
            <ul class="nav">
              <li class="active"><a href="/login">Get Pin</a></li>
   
            </ul>
          </div><!--/.nav-collapse -->
        </div>
      </div>
    </div>
    <div class="container">
   
    <h2>Step 1</h2>
    <p>Go to your ecobee portal under the settings tab and add this application using the pin code below</p>
    <div class="pin">
    <p class="lead">pin code: [color=red][i][b]4_DIGIT_PINCODE_WAS_HERE[/b][/i][/color]</p>
    </div>
    <h2>Step 2</h2>
    <p>Once you have authorized this app to have access to your account you may log in by clicking the Complete Link button below.</p>
   
    <form class="form-signin" action="/login" method="POST">
    <fieldset>
    <input name="authcode" type="hidden" value="[color=red][i][b]AUTHCODE_WASHERE[/b][/i][/color]" />
    <button class="btn" type="submit" data-loading-text="waiting 30 seconds">Complete Link</button>
    </fieldset>
    </form>
   
   
    </div> <!-- /container -->

  </body>
</html>
« Last Edit: July 14, 2015, 01:34:02 PM by jitterjames »

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: need help using regex after i scrape a localhost site
« Reply #15 on: July 14, 2015, 01:34:59 PM »
Please place any code (including XML HTML etc.) in a code box.  Especially when it is very long.

You can't force scrape to use a "browser" but you can use RoboBrowser.

2exclusive

  • Contributor
  • ***
  • Posts: 54
  • Karma: 0
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #16 on: July 14, 2015, 03:23:10 PM »
will do next time, looking into RoboBrowser to accomplish this. thanks again

2exclusive

  • Contributor
  • ***
  • Posts: 54
  • Karma: 0
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #17 on: July 15, 2015, 05:07:21 PM »
ok so I finally made some progress here using Robo Browser an using RoboB.GetHTML to scrape for the temperature digits which worked. How do you use the Results.RegEx output for TTS to speak it.

i have osd.showText with parameter {LastResult} but its showing me the HTML output not the Results.RegEx output. hope this is clear 

see my xml

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.1.5.2-->
<command id="1379" name="RoboB_test" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>RoboB.Select</cmdType>
    <params>
      <param>Ecobee</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.Navigate</cmdType>
    <params>
      <param>https://www.ecobee.com/consumerportal/index.html#/thermostats</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.Wait</cmdType>
    <params />
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.ElementByTag</cmdType>
    <params>
      <param>DIV</param>
      <param>21</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.GetHTML</cmdType>
    <params />
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegEx</cmdType>
    <params>
      <param>&gt;(.*?)&lt;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>{LastResult}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
</command>

2exclusive

  • Contributor
  • ***
  • Posts: 54
  • Karma: 0
    • View Profile
Re: need help using regex after i scrape a localhost site
« Reply #18 on: July 15, 2015, 05:11:02 PM »
nevermind i figured it out, it was to use {Match.1}   :bonk

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 2012
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: need help using regex after i scrape a localhost site
« Reply #19 on: July 15, 2015, 05:29:46 PM »
 ::banana
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)