Script for pulling values from an rss file

Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

mudskipper replied

29 March 2013, 19:07
Leave a comment:
Pondlife replied

28 March 2013, 10:30
Cheers guys.

I also like Doodab's idea but can't find a free version of the tables or a formula.

Will have a looky at NFs feed parser but more as an exercise in technical masturbation since Platy and Contreras have solved it for me.

Rep will be forthcoming to all (Platy will have to wait a bit )
Leave a comment:

Platypus replied

28 March 2013, 08:28

Originally posted by Platypus View Post

Code:

cat $1 | grep "description.*Low" | sed 's/&#x28;/(/g' | sed 's/&#x29;/)/g' | sed 's/^.*\([0-9][0-9]:[0-9][0-9] - Low Tide ([.0-9][.0-9]*m)\).*\([0-9][0-9]:[0-9][0-9] - High Tide ([.0-9][.0-9]*m)\).*\([0-9][0-9]:[0-9][0-9] - Low Tide ([.0-9][.0-9]*m)\).*\([0-9][0-9]:[0-9][0-9] - High Tide ([.0-9][.0-9]*m)\).*$/\1+\2+\3+\4/' | tr '+' '\n'

Originally posted by Contreras View Post

Code:

~$ sed 's/&[^;]*;/\n/g;/^..:.. - .* Tide \n/{s/\n//;P};D' tide.rss

Smart Arse

Leave a comment:

DaveB replied

27 March 2013, 20:34
Originally posted by Pondlife View Post

I have an rss file containing the following;

Code:

<?xml version="1.0" encoding="utf-8"?> <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> <channel> <title>Some Tide Times</title> <link>http://www.tidetimes.org.uk/Some-tide-times</link> <description>Some tide times.</description> <lastBuildDate>Wed, 27 Mar 2013 00:00:00 GMT</lastBuildDate> <language>en-gb</language> <atom:link href="http://www.tidetimes.org.uk/Some-tide-times.rss" rel="self" t ype="application/rss+xml"/> <item> <title>Some Tide Times for 27th March 2013</title> <link>http://www.tidetimes.org.uk/Some-tide-times</link> <guid>http://www.tidetimes.org.uk/Some-tide-times</guid> <pubDate>Wed, 27 Mar 2013 00:00:00 GMT</pubDate> <description><a href="http://www.tidetimes.org.uk" title="Tide Times">Ti de Times</a> & Heights for <a href="http://www.tidetimes. org.uk/Some-tide-times" title="Some tide times">Some</a> on 27th Mar ch 2013 00:34 - Low Tide (1.40m) 06:43 - High Tide (11.60m) 12:58 - Low Tide (1.40m)19:05 - High Tide (11.70m) </description> </item> </channel> </rss>

I have no idea how sed and awk work but I guess they are the tools for the job? What I want to do is be able to pull the High and low tide times and heights from inside the description tags using a shell script.

Any ideas?

TIA

Pondy

Should be relatively straight forward to do it with a regular expression (regex) as the data will be in a consistent format. ie always low tide height, high tide height, low tide height, high tide height with the values always in the same format nn:nn n.nnm for time and height respectively. You might have to frig it a bit to account for exceptional tide heights (10m +) if they exist.

I'm rusty on this stuff as I havent' written shell scripts in years but you should be able to use an appropriately crafted regex in sed to ditch everything up to the first tide time then strip out the extraneous rubbish between the data you want using pattern matching to pick out the bits you want to keep and dump the whole lot into a file.

Sed itself is easy , basic pattern is :

sed -e 's/oldstuff/newstuff/g' inputFileName > outputFileName

Which is basically saying run sed in execute mode (-e), search the input file for a pattern that matches "oldstuff" and replace it with "newstuff" and put the whole thing into a new file when your done.

Replace oldstuff and newstuff with the regex to identify the data you want and the desired output and you should be away.

Of course figuring out the regex is going to be the fun part. especially as you are going to have to use back references to hold the bits you want and put them into the output file.

If you are running is a command line setting it may be easier to do the whole thing in Perl if you have it available, although you will still need to get your head around the regex.
Leave a comment:

doodab replied

27 March 2013, 18:30

Originally posted by Pondlife View Post

I have an rss file containing the following;

Code:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
 <title>Some Tide Times</title>
 <link>http://www.tidetimes.org.uk/Some-tide-times</link>
 <description>Some tide times.</description>
 <lastBuildDate>Wed, 27 Mar 2013 00:00:00 GMT</lastBuildDate>
 <language>en-gb</language>
 <atom:link href="http://www.tidetimes.org.uk/Some-tide-times.rss" rel="self" t
ype="application/rss+xml"/>
 <item>
  <title>Some Tide Times for 27th March 2013</title>
  <link>http://www.tidetimes.org.uk/Some-tide-times</link>
  <guid>http://www.tidetimes.org.uk/Some-tide-times</guid>
  <pubDate>Wed, 27 Mar 2013 00:00:00 GMT</pubDate>
  <description>&lt;a href="http://www.tidetimes.org.uk" title="Tide Times"&gt;Ti
de Times&lt;/a&gt; &amp; Heights for&lt;br/&gt;&lt;a href="http://www.tidetimes.
org.uk/Some-tide-times" title="Some tide times"&gt;Some&lt;/a&gt; on 27th Mar
ch 2013&lt;br/&gt;&lt;br/&gt;00:34 - Low Tide &#x28;1.40m&#x29;&lt;br/&gt;06:43 
- High Tide &#x28;11.60m&#x29;&lt;br/&gt;12:58 - Low Tide &#x28;1.40m&#x29;&lt;b
r/&gt;19:05 - High Tide &#x28;11.70m&#x29;&lt;br/&gt;</description>
 </item>
</channel>
</rss>

I have no idea how sed and awk work but I guess they are the tools for the job? What I want to do is be able to pull the High and low tide times and heights from inside the description tags using a shell script.

Any ideas?

TIA

Pondy

You probably want to use XSL.

Sed and awk aren't really suited to xml.

I have an rss file containing the following;

Code:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
 <title>Some Tide Times</title>
 <link>http://www.tidetimes.org.uk/Some-tide-times</link>
 <description>Some tide times.</description>
 <lastBuildDate>Wed, 27 Mar 2013 00:00:00 GMT</lastBuildDate>
 <language>en-gb</language>
 <atom:link href="http://www.tidetimes.org.uk/Some-tide-times.rss" rel="self" t
ype="application/rss+xml"/>
 <item>
  <title>Some Tide Times for 27th March 2013</title>
  <link>http://www.tidetimes.org.uk/Some-tide-times</link>
  <guid>http://www.tidetimes.org.uk/Some-tide-times</guid>
  <pubDate>Wed, 27 Mar 2013 00:00:00 GMT</pubDate>
  <description>&lt;a href="http://www.tidetimes.org.uk" title="Tide Times"&gt;Ti
de Times&lt;/a&gt; &amp; Heights for&lt;br/&gt;&lt;a href="http://www.tidetimes.
org.uk/Some-tide-times" title="Some tide times"&gt;Some&lt;/a&gt; on 27th Mar
ch 2013&lt;br/&gt;&lt;br/&gt;00:34 - Low Tide &#x28;1.40m&#x29;&lt;br/&gt;06:43 
- High Tide &#x28;11.60m&#x29;&lt;br/&gt;12:58 - Low Tide &#x28;1.40m&#x29;&lt;b
r/&gt;19:05 - High Tide &#x28;11.70m&#x29;&lt;br/&gt;</description>
 </item>
</channel>
</rss>

Tags: None

Reply to: Script for pulling values from an rss file

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

Previously on "Script for pulling values from an rss file"

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: