Author Topic: Can't scrape https anymore  (Read 1911 times)

0 Members and 1 Guest are viewing this topic.

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Can't scrape https anymore
« on: September 20, 2018, 08:46:13 PM »
I've been having issues with scraping secured sites (Https), keep getting the same error


Quote
Error: System.NetWebException: The remote server returned an error: (403) Forbidden. at System.Net.HttpWebRequest.GetResponse() at Eval_ .(String  A_0




is there anyway to get around this

VC Version 2.2.4.1
Windows 10

Thanks

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7715
  • Karma: 116
    • View Profile
    • VoxCommando
Re: Can't scrape https anymore
« Reply #1 on: September 20, 2018, 09:35:31 PM »
I don't know.  Error 403 means forbidden.  It does not specify why the request is forbidden.

VC doesn't have a problem scraping https sites in general so it must be something specific to the site you are trying to access.

For example, if you scrape here:

 https://voxcommando.com/forum/index.php?action=recent

...you will not have any problem.

I assume that you are able to browse that site with some other browser and that is why you are posting here.  It is hard to know why a site might return the 403 error.  One possibility, but certainly not the only one is that the server expects certain headers.  Maybe it expects a header to identify the user agent, or the type of request needs to be of a certain type etc.

It's a long shot but I would look through the full error just to see if there is some kind of sub-error.

If you can figure out what the reason for the 403 error is then you can probably get it to work using the scrape.get and filling in the correct optional parameters.  HTTP can get messy.

I don't think that the problem actually comes from the fact that they are https sites though.  I suspect that it is just a coincidence.

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: Can't scrape https anymore
« Reply #2 on: September 21, 2018, 01:17:26 AM »
yes I can reach it in another browser, I thought it was an https error as I had a different url a week or so ago that did the same thing but I can't remember what the url was as I was doing an overhaul to my remote since autoremote has been iffy at best lately.

I will do some more digging and see if I can figure out a different way to detect if my VPN is connected instead of using https://whatismyipaddress.com/

I Switched VPN's so I'm rebuilding my VPN actions so I can launched with vc through my remote or through kodi

Kalle

  • $upporter
  • Hero Member
  • *****
  • Posts: 2319
  • Karma: 47
    • View Profile
Re: Can't scrape https anymore
« Reply #3 on: September 21, 2018, 02:33:01 AM »
I found a solution after James give me the hint - voila, that works.

You can use in VC the Scrape.UserAgent command for the whatismyipaddress website and put the following header in the UserAgent parameter field:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36

Then you can open the RexExTool with the {LastResult}


here is a VC command example for the whatismyipaddress website:


Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.2.4.1-->
<command id="130" name="https scrape" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Scrape.UserAgent</cmdType>
    <params>
      <param>https://whatismyipaddress.com/</param>
      <param>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegEx</cmdType>
    <params>
      <param>&lt;strong&gt;IPv4:&lt;/strong&gt;.+/ip/(.*?)'</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>Your IP address is: {Match.1.1}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <phrase>scrape secured</phrase>
</command>
« Last Edit: September 21, 2018, 11:18:01 AM by Kalle »
***********  get excited and make things  **********

PegLegTV

  • $upporter
  • Hero Member
  • *****
  • Posts: 500
  • Karma: 43
    • View Profile
Re: Can't scrape https anymore
« Reply #4 on: September 22, 2018, 02:27:14 AM »
Nice find kalle  ::bow, I would have never figured that one out lol, but I would hold onto that in case I need it in the future for other sites

I ended up going a different route as I was trying to test my VPN connections so I could tell where and if I was connected to my VPN or if my VPN was off, being that I can configure my VPN to work with windows builtin VPN software I went that route so I could use CMD line to launch and exit the VPN using Rasdial.exe

checking windows VPN Connection with Rasdial.exe
Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.2.4.1-->
<command id="858" name="VPN.Status.Check" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Launch.Capture</cmdType>
    <params>
      <param>C:\Windows\System32\rasdial.exe</param>
      <param />
      <param>True</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegExSingle</cmdType>
    <params>
      <param>(.*?)Command\scompleted\ssuccessfully.</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>VPN: {Match.1.1}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <event>VPN.Status.Check</event>
</command>