Author Topic: Problems with scrape  (Read 1709 times)

0 Members and 1 Guest are viewing this topic.

lexanic37

  • Jr. Member
  • **
  • Posts: 7
  • Karma: 0
    • View Profile
Problems with scrape
« on: December 06, 2015, 04:19:59 PM »
Hello I have a question on regular expressions . I want scrape a Russian social network vkontkate .I don't login in my account even though I have more options in your login and password.

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 2009
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Problems with scrape
« Reply #1 on: December 06, 2015, 05:30:24 PM »
Hi,

The Scrape action's username and password parameters are for Basic HTTP authentication only.
 
That is a different procedure than logging into a website such as a social network.

You will have to use keyboard emulation or the RoboBrowser plugin. Usually, we use the RoboBrowser plugin for this type of website. http://voxcommando.com/mediawiki/index.php?title=RoboBrowser

It will be difficult for you, though, because we don't have any Russian documentation for the RoboBrowser plugin.

What is your ultimate objective?
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

lexanic37

  • Jr. Member
  • **
  • Posts: 7
  • Karma: 0
    • View Profile
Re: Problems with scrape
« Reply #2 on: December 08, 2015, 12:22:32 PM »
It will be difficult for you, though, because we don't have any Russian documentation for the RoboBrowser plugin.
Hi,

The Scrape action's username and password parameters are for Basic HTTP authentication only.
 
That is a different procedure than logging into a website such as a social network.

You will have to use keyboard emulation or the RoboBrowser plugin. Usually, we use the RoboBrowser plugin for this type of website. http://voxcommando.com/mediawiki/index.php?title=RoboBrowser

What is your ultimate objective?
I want use keyboard emulation. Pls help help me in this ::dis

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 2009
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Problems with scrape
« Reply #3 on: December 08, 2015, 03:15:48 PM »
Keyboard emulation is the method, but not the objective.

I can show you how to sign in to the website with:

1. Keyboard emulation
and
2. The RoboBrowser plugin.

But I am not sure this will help very much.

To test these commands, you can copy all the code and paste it into your command tree window in VoxCommando.

(Video demonstration here)


Method 1.
You need to replace the user name and password.
You may need to change the VC.Pause durations if they are too long or too short.

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.2.0.9-->
<command id="351" name="++log in to vk dotcom - KB emulation" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Launch.OpenURL</cmdType>
    <params>
      <param>vk.com</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>VC.Pause</cmdType>
    <params>
      <param>4000</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>InputKeys.TextEntry</cmdType>
    <params>
      <param>your username goes here</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>VC.Pause</cmdType>
    <params>
      <param>1000</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>InputKeys.Send</cmdType>
    <params>
      <param>{ENTER}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>VC.Pause</cmdType>
    <params>
      <param>1000</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>InputKeys.TextEntry</cmdType>
    <params>
      <param>your password goes here</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>VC.Pause</cmdType>
    <params>
      <param>200</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>InputKeys.Send</cmdType>
    <params>
      <param>{ENTER}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <phrase>log in to vk dot com</phrase>
</command>

Method 2: RoboBrowser always uses Internet Explorer.

Enable the RoboB plugin
Replace the user name and password.
You can change the size of the browser window.

Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.2.0.9-->
<command id="353" name="log in to vk dot com" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>RoboB.Select</cmdType>
    <params>
      <param>vk</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.Navigate</cmdType>
    <params>
      <param>http://vk.com/</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.Wait</cmdType>
    <params />
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.SetWinSize</cmdType>
    <params>
      <param>1280</param>
      <param>720</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.Show</cmdType>
    <params />
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.ElementByID</cmdType>
    <params>
      <param>quick_email</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.SetText</cmdType>
    <params>
      <param>YOU@EMAIL.COM</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.ElementByID</cmdType>
    <params>
      <param>quick_pass</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.SetText</cmdType>
    <params>
      <param>YOUR PASSWORD</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.ElementByTag</cmdType>
    <params>
      <param>BUTTON</param>
      <param>1</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RoboB.Click</cmdType>
    <params />
    <cmdRepeat>1</cmdRepeat>
  </action>
  <phrase>log in to veekay dot com with Robo Browser</phrase>
</command>
« Last Edit: December 08, 2015, 06:21:20 PM by nime5ter »
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

lexanic37

  • Jr. Member
  • **
  • Posts: 7
  • Karma: 0
    • View Profile
Re: Problems with scrape
« Reply #4 on: December 09, 2015, 12:11:29 PM »
Keyboard emulation is the method, but not the objective.

I can show you how to sign in to the website with:

1. Keyboard emulation
and
2. The RoboBrowser plugin.

But I am not sure this will help very much.
Thank you so much. For your responsiveness. But do not the other way does not suit me. Since I would like to read information from the site without opening your browser. But thank you very much for that.
« Last Edit: December 09, 2015, 12:19:18 PM by nime5ter »

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 2009
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Problems with scrape
« Reply #5 on: December 09, 2015, 12:35:04 PM »
I see.

I am able to scrape your profile page as well as other vk.com pages without logging in using the regular Scrape action.

But I think it will be difficult to scrape the content of your profile page using only regular expressions and the Scrape action because of the complex html and multimedia on that page.

For example:
Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.2.0.9-->
<command id="405" name="scrape Alexei's profile" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Scrape</cmdType>
    <params>
      <param>http://vk.com/alekcei_1999</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RegExTool.Open</cmdType>
    <params>
      <param>True</param>
    </params>
    <cmdRepeat>0</cmdRepeat>
  </action>
  <action>
    <cmdType>Tools.Decode.HTML</cmdType>
    <params />
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegExSingle</cmdType>
    <params>
      <param>pi_text"&gt;(&lt;span&gt;|)(.*?)&lt;.*?pi_author.*?&gt;(.*?)&lt;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>{#M} posts found</param>
      <param>7000</param>
      <param>-5</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.AddText</cmdType>
    <params>
      <param>{match.{i}.3} : {match.{i}.2} </param>
    </params>
    <cmdRepeat>{#M}</cmdRepeat>
  </action>
  <phrase>scrape Alexei's profile</phrase>
</command>

So I think this may only work well if you want something very specific.

For example:
Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 2.2.0.9-->
<command id="348" name="scrape vk.com/search" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Scrape</cmdType>
    <params>
      <param>http://vk.com/search</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>RegExTool.Open</cmdType>
    <params>
      <param>True</param>
    </params>
    <cmdRepeat>0</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegExSingle</cmdType>
    <params>
      <param>="si_owner"&gt;(.*?)&lt;/span&gt;.*?slabel"&gt;(.*?)&lt;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>Top 5 profiles</param>
      <param>7000</param>
      <param>-5</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.AddText</cmdType>
    <params>
      <param>{i}. {match.{i}.1} - {match.{i}.2}</param>
    </params>
    <cmdRepeat>5</cmdRepeat>
  </action>
  <phrase>show me top five people</phrase>
</command>

The reason we use RoboBrowser is because it allows us to target very specific elements on a web page, which is often needed to interact with complex websites.

The RoboBrowser window does not need to be visible. It is only visible if the "RoboB.Show" action is used in the command. It will still be running on your computer when you issue the command, however.

TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)