Author Topic: Regex help to extract artist, song from web page  (Read 2249 times)

0 Members and 1 Guest are viewing this topic.

xtermin8r

  • $upporter
  • Sr. Member
  • *****
  • Posts: 366
  • Karma: 9
  • Crunchie
    • View Profile
Regex help to extract artist, song from web page
« on: May 22, 2014, 12:18:21 PM »
Dear Voxinator

I'm struggling to construct the regex pattern that extracts the artist and song name for another mpc xperiment I'm doing.
the url in question http://localhost:13579/info.html
Code: [Select]
<p id="mpchc_np">&laquo; MPC-HC v1.7.3.0 &bull; Snap - The Power &bull; 00:00:00/00:04:24 &bull; 19.4 MB &raquo;</p>

I'm trying to extract Snap as artist and The Power as song, so that I can ask Jarvis to tell me who the artist is, and tell me the song name.

the only thing I could come up with is &bull;(.*?)(.*?)&bull which gives me Snap - The Power

thanks in advance.
« Last Edit: May 22, 2014, 12:57:40 PM by xtermin8r »
Neural Net Based Artificial Intelligence.

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Regex help to extract artist, song from web page
« Reply #1 on: May 22, 2014, 12:55:11 PM »
How consistent is the pattern?

Does
Code: [Select]
<p id="mpchc_np">&laquo; MPC-HC v1.7.3.0 &bull; always precede the song name?

Is the artist name always separated from the song name by " - "?

Does a bullet ("&bull;") always appear after the band name?
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Regex help to extract artist, song from web page
« Reply #2 on: May 22, 2014, 12:59:46 PM »
e.g. the following gets the info you want, but you may run into problems depending on pattern variability


Code: [Select]
<?xml version="1.0" encoding="utf-16"?>
<!--VoxCommando 1.9.5.1-->
<command id="1151" name="get song info" enabled="true" alwaysOn="False" confirm="False" requiredConfidence="0" loop="False" loopDelay="0" loopMax="0" description="">
  <action>
    <cmdType>Results.SetLastResult</cmdType>
    <params>
      <param>&lt;p id="mpchc_np"&gt;&amp;laquo; MPC-HC v1.7.3.0 &amp;bull; Snap - The Power &amp;bull; 00:00:00/00:04:24 &amp;bull; 19.4 MB &amp;raquo;&lt;/p&gt;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>Results.RegEx</cmdType>
    <params>
      <param>&amp;bull;\s(.*?).-.(.*?)&amp;bull;</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.ShowText</cmdType>
    <params>
      <param>Artist: {Match.1.1}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
  <action>
    <cmdType>OSD.AddText</cmdType>
    <params>
      <param>Song: {Match.1.2}</param>
    </params>
    <cmdRepeat>1</cmdRepeat>
  </action>
</command>

In the above, {Match.{i}.1} will always be the artist, and {Match.{i}.2} will be the song, if the pattern is consistent throughout.

[edited to correct which is artist and which is song.  :biglaugh]
« Last Edit: May 22, 2014, 01:02:20 PM by nime5ter »
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

xtermin8r

  • $upporter
  • Sr. Member
  • *****
  • Posts: 366
  • Karma: 9
  • Crunchie
    • View Profile
Re: Regex help to extract artist, song from web page
« Reply #3 on: May 22, 2014, 01:01:21 PM »
How consistent is the pattern?

Does
Code: [Select]
<p id="mpchc_np">&laquo; MPC-HC v1.7.3.0 &bull; always precede the song name?
yes

Quote
Is the artist name always separated from the song name by " - "?
not always

Quote
Does a bullet ("&bull;") always appear after the band name?
yes always after the full artist song name
Neural Net Based Artificial Intelligence.

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Regex help to extract artist, song from web page
« Reply #4 on: May 22, 2014, 01:04:50 PM »
not always

That will be a problem. Are there a set number of possibilities? Is it a public web page that I can look at?
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

xtermin8r

  • $upporter
  • Sr. Member
  • *****
  • Posts: 366
  • Karma: 9
  • Crunchie
    • View Profile
Re: Regex help to extract artist, song from web page
« Reply #5 on: May 22, 2014, 01:05:51 PM »
Thank you nime5ter. Its exactly what I needed, if artist name and song are not separated by a - I guess i will have to manually insert it into the file name.
Neural Net Based Artificial Intelligence.

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Regex help to extract artist, song from web page
« Reply #6 on: May 22, 2014, 01:08:09 PM »
Yeah, I was just about to say, if it's your own library, then the best thing would be if you could rename the files in a consistent way.
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

xtermin8r

  • $upporter
  • Sr. Member
  • *****
  • Posts: 366
  • Karma: 9
  • Crunchie
    • View Profile
Re: Regex help to extract artist, song from web page
« Reply #7 on: May 22, 2014, 01:09:25 PM »
That will be a problem. Are there a set number of possibilities? Is it a public web page that I can look at?

it's not a public web page, it's the web page of media player classic, i could upload the html if you like
Neural Net Based Artificial Intelligence.

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Regex help to extract artist, song from web page
« Reply #8 on: May 22, 2014, 01:14:57 PM »
If you like, sure. But it seems like it would make more sense, and be more reliable, if you could create consistent file names and then work with that.

The above obviously won't work perfectly if you have file names with more than one " - " in the name.

TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

xtermin8r

  • $upporter
  • Sr. Member
  • *****
  • Posts: 366
  • Karma: 9
  • Crunchie
    • View Profile
Re: Regex help to extract artist, song from web page
« Reply #9 on: May 22, 2014, 01:26:35 PM »
If you like, sure. But it seems like it would make more sense, and be more reliable, if you could create consistent file names and then work with that.
Personally i think there is no need to upload it (waste of digital space)  :biglaugh, the relevent info is in the first post.
I agree, it would be more reliable if file names are consistent and include a "-" between the artist and song name.

Quote
The above obviously won't work perfectly if you have file names with more than one " - " in the name.
True, I will have to use other methods to get rid of any extra "-" in the name.




Neural Net Based Artificial Intelligence.

nime5ter

  • Administrator
  • Hero Member
  • *****
  • Posts: 1999
  • Karma: 61
    • View Profile
    • Getting Started with VoxCommando
Re: Regex help to extract artist, song from web page
« Reply #10 on: May 22, 2014, 01:34:44 PM »
True, I will have to use other methods to get rid of any extra "-" in the name.

Or use a more unique character to separate song and artist in your file names.
TIPS: POST VC VERSION #. Explain what you want VC to do. Say what you've tried & what happened, or post a video demo. Attach VC log. Link to instructions followed.  Post your command (xml)

jitterjames

  • Administrator
  • Hero Member
  • *****
  • Posts: 7713
  • Karma: 116
    • View Profile
    • VoxCommando
Re: Regex help to extract artist, song from web page
« Reply #11 on: May 23, 2014, 10:05:51 AM »
Looks like it is just returning the filename.

I recommend you use a proper music management program like MediaMonkey.  It is free and can't be beat. Even if you don't want to use it to play your music for some reason, you can still use it to organise your music, maintain proper tags, album art etc. And it can then organise your files with predictable path and filename formats.