Help wanted - extracting data from the web using Excel 2007

Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

xoggoth replied

20 November 2011, 19:56
Down bottom of this page on my site is the current Worldpay GBP-EURO conversion.

Gatekeeper Sales

See page source, 2 elements and a tiny bit of jscript, sidepics function not relevant. Seem to recall I used the hidden and visible elements with jscript copy between due to an appearance problem with frame border, so jscript may be quite unnecessary, just bung value in visible element in 1st place. All it uses otherwise is some fairly self explanatory PHP:

<?php
$data = file_get_contents("https://select.worldpay.com/wcc/info?op=rates&instId=123456",0);
$pos = strpos($data, "rateDateString")-1;
$date = substr($data,1, $pos);
$clen = strlen($data) - strpos($data, "GBP_EUR") - 8;
$data = substr($data,-$clen);
echo "$date<br>1 British Pound = $data Euro";
?>

Only problem might be accessing some external websites from third party hosting, security and all that.

Last edited by xoggoth; 20 November 2011, 20:06.
Leave a comment:
IR35 Avoider replied

20 November 2011, 18:51
I don't want to seem incredibly lazy, but it occurs to me that what I need is a web-based web data extractor, that effectively works like the Excel wizard in that it will take a supplied URL and let you point at something on that page, and generate a web page that just contains that thing. It just needs to improve on Excel by not restricting the selectable items to (most) tables.

In other words, no need to work out the syntax of a query.

Once you have that, excel can pull in the data from the resulting page, using its query wizard.

(Not sure if YQL query builder answers this description, will have to look into that.)

Last edited by IR35 Avoider; 20 November 2011, 18:54.
Leave a comment:
MarillionFan replied

20 November 2011, 18:45
Originally posted by Zippy View Post

It seems to me to reliably recognise a <table> tag, but I'd be interested in hearing about situations where that hasn't happened.

I use the OPs approach quite a lot and its inconsistent. Not as bad as when you want to refresh a number of them and the bloody internet slows up. I tend to loop through a number of webqueries on multiple pages and the concatenate the data.
Leave a comment:
Zippy replied

20 November 2011, 18:30
Originally posted by IR35 Avoider View Post

Yes, I noticed that after I posted. However in this context a table is only a table if Excel 2007 web query wizard allows you to select it. I've noticed previously that it doesn't recognise some tables.

It seems to me to reliably recognise a <table> tag, but I'd be interested in hearing about situations where that hasn't happened.
Leave a comment:
IR35 Avoider replied

20 November 2011, 18:09
Special thanks to NickFitz for recent posts - it will probably take a day or two before I understand everything you say, but I feel you are putting me on the right track.
Leave a comment:
IR35 Avoider replied

20 November 2011, 18:03
Originally posted by k2p2 View Post

Not wishing to be picky but...

BOE are using tables for their interest rate.

Yes, I noticed that after I posted. However in this context a table is only a table if Excel 2007 web query wizard allows you to select it. I've noticed previously that it doesn't recognise some tables.
Leave a comment:

suityou01 replied

20 November 2011, 12:24

Originally posted by NickFitz View Post

Ah, so the following YQL query will do the needful:

Code:

select * from html 
    where url="http://www.bankofengland.co.uk/" 
    and xpath="//table[@id='keyfacts']/tr[1]/td[2]/p/text()"

returning the following XML:

Code:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="1" yahoo:created="2011-11-19T20:13:13Z" yahoo:lang="en-US">
    <results>0.5%</results>
</query>

from which the result can easily be extracted using the MSXML DOM, or whatever similar facilities Excel might offer for importing XML over HTTP.

Edit: that YQL URL in full:

Code:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fwww.bankofengland.co.uk%2F%22%20and%20xpath%3D%22%2F%2Ftable%5B%40id%3D'keyfacts'%5D%2Ftr%5B1%5D%2Ftd%5B2%5D%2Fp%2Ftext()%22

Wow that is art!!! You can query anything these days

I also notice that

Code:

select * from html 
    where url="http://www.contractorukcom/forums/" 
    and xpath="//table[@id='poster_type'='Bellend']/tr[1]/td[2]/p/text()"

returns

Code:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="1" yahoo:created="2011-11-19T20:13:13Z" yahoo:lang="en-US">
    <results>MarillionFan</results>
</query>

Leave a comment:

aussielong replied

20 November 2011, 05:48
I've done a bit of this in the past, in Java. This is how I did it and found it quite maintainable..

Use Tagsoup to turn the HTML into well formed XML.

Use XPath expressions to address the parts of the XML document that have the data i'm after.

When the page is updated, your XPaths might need updating - I used an XPath generator plugin in Eclipse so I just reload the page, point and click the bit i'm interested in- and it tells me the XPath I need. I then update my screen scraper with the new XPath.

I've been doing this to scrape some data from a vendor website for a few years now. They have updated the site over time but my stuff still works. This would take an hour or two to knock up and then no more coding to maintain it.

(You just need a thin JNI/COM wrapper to call this from your spready.)

Last edited by aussielong; 20 November 2011, 05:52.
Leave a comment:
pacharan replied

20 November 2011, 03:43
Scroon screeping?
Leave a comment:
NickFitz replied

19 November 2011, 20:56
Originally posted by MarillionFan View Post

Ah! You've done it. I was just looking at that as well but I am afraid X-Factor beckon.

Now I was trying to see how you could format that or return it in a tabular SQL style format with a view to UNION a few different queries together. I'm not sure YSQL is quite the same as SQL output without reading up on it all?

So the results would be

Source Description Date Value
BankofEngland.co.uk Bank Base Rate 19/11/2011 20:20 0.5%
MotleyFool.co.uk/Lloyds Lloyds Share Price 19/11/2011 20:20 25.5

Pull together some queries like that and you have something very useful.

I think the appropriate approach to aggregating data from multiple sources like that is to create a YQL Open Data Table and add it to their repository; then Yahoo! will do all the heavy lifting of grabbing the data and aggregating it on their backend, and send you just the bits you need.

Your share price query doesn't specify bid or ask: you presumably want something like select Bid, Ask from yahoo.finance.quotes where symbol='lloy.l'

Last edited by NickFitz; 19 November 2011, 23:29.
Leave a comment:
MarillionFan replied

19 November 2011, 20:23
Originally posted by NickFitz View Post

Ah, so the following YQL query will do the needful:

Code:

select * from html where url="http://www.bankofengland.co.uk/" and xpath="//table[@id='keyfacts']/tr[1]/td[2]/p/text()"

returning the following XML:

Code:

<?xml version="1.0" encoding="UTF-8"?> <query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:count="1" yahoo:created="2011-11-19T20:13:13Z" yahoo:lang="en-US"> <results>0.5%</results> </query>

from which the result can easily be extracted using the MSXML DOM, or whatever similar facilities Excel might offer for importing XML over HTTP.

Ah! You've done it. I was just looking at that as well but I am afraid X-Factor beckon.

Now I was trying to see how you could format that or return it in a tabular SQL style format with a view to UNION a few different queries together. I'm not sure YSQL is quite the same as SQL output without reading up on it all?

So the results would be

Source Description Date Value
BankofEngland.co.uk Bank Base Rate 19/11/2011 20:20 0.5%
MotleyFool.co.uk/Lloyds Lloyds Share Price 19/11/2011 20:20 25.5

Pull together some queries like that and you have something very useful.
Last edited by MarillionFan; 19 November 2011, 20:27.
Leave a comment:

NickFitz replied

19 November 2011, 20:16

Originally posted by k2p2 View Post

Not wishing to be picky but...

BOE are using tables for their interest rate.

Ah, so the following YQL query will do the needful:

Code:

select * from html 
    where url="http://www.bankofengland.co.uk/" 
    and xpath="//table[@id='keyfacts']/tr[1]/td[2]/p/text()"

returning the following XML:

Code:

<?xml version="1.0" encoding="UTF-8"?>
<query xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:count="1" yahoo:created="2011-11-19T20:13:13Z" yahoo:lang="en-US">
    <results>0.5%</results>
</query>

from which the result can easily be extracted using the MSXML DOM, or whatever similar facilities Excel might offer for importing XML over HTTP.

Edit: that YQL URL in full:

Code:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fwww.bankofengland.co.uk%2F%22%20and%20xpath%3D%22%2F%2Ftable%5B%40id%3D'keyfacts'%5D%2Ftr%5B1%5D%2Ftd%5B2%5D%2Fp%2Ftext()%22

Last edited by NickFitz; 19 November 2011, 20:19.

Reply to: Help wanted - extracting data from the web using Excel 2007

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

Previously on "Help wanted - extracting data from the web using Excel 2007"

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Partners

Advertisers

Contractor Services

CUK News

Reply to: Help wanted - extracting data from the web using Excel 2007

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

Previously on "Help wanted - extracting data from the web using Excel 2007"

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Partners

Advertisers

Contractor Services

CUK News

Tag Cloud