Originally posted by NickFitz
View Post
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Reply to: Tool to inspect a website structure?
Collapse
You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:
- You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
- You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
- If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.
Logging in...
Previously on "Tool to inspect a website structure?"
Collapse
-
-
Not completely related but anyone read the investigation of how they caught the Facebook worm Koobface creators? Much of that was from info left on servers..
Very interesting... if you like that type of thing...
The Koobface malware gang – exposed! | Naked Security
Leave a comment:
-
Originally posted by NickFitz View PostHowever don't thereby start to believe that putting a page/file on a server and not linking to it is a good way of keeping it secure from prying eyes. There are a number of ways things can end up being accidentally linked to. For example, it's not unknown for server logs to accidentally be made available at an unsecured URL...
So, if you do a search by filetype on Google, you can easily find (for example) sites which have *.php~ files. Which won't get executed as php, and will expose the contents to anyone that looks.
If you do a search for "wp-config.php~" I reckon you could quite easily find the database connection and password for quite a few Wordpress blogs out there.....
Leave a comment:
-
Originally posted by d000hg View PostThat is the question being asked. When a new site goes up Google finds it and crawls the home-page... how does it find the home-page in the first place?
I always thought if I put up a page mysite.com/some_random_page.html, Google would find it and index it even if my homepage doesn't link to it. Not the case?
Other than that, as PAH and NLUK have said, it's just a question of following links.
However don't thereby start to believe that putting a page/file on a server and not linking to it is a good way of keeping it secure from prying eyes. There are a number of ways things can end up being accidentally linked to. For example, it's not unknown for server logs to accidentally be made available at an unsecured URL...
Leave a comment:
-
Originally posted by d000hg View PostExactly
That is the question being asked. When a new site goes up Google finds it and crawls the home-page... how does it find the home-page in the first place?
I always thought if I put up a page mysite.com/some_random_page.html, Google would find it and index it even if my homepage doesn't link to it. Not the case?
1) Having the page submitted manually to Google which you can do here Overview ? Submit your content
Make it a page with either a sitemap xml or a lot of links through your page (like a sitemap page). Google then crawls all the links. Submit a single page with no links in our out and it will take that page alone once, bugger off and never return.
2) Have links from other pages that google rates (for faster and more frequent crawling) and it's spider will come visit you at some point. Paid links or relevant content links. The more relative the better google will deem it and more likely rate higher.
3) Submit to user generated sites like DMOZ but because it is user authenticated it can take forever.
Google AFAIK does not document new pages that appear out of the blue. It has to be connected for the spiders to find it.. No linkey no likey....Last edited by northernladuk; 18 January 2012, 13:29.
Leave a comment:
-
Originally posted by d000hg View PostI always thought if I put up a page mysite.com/some_random_page.html, Google would find it and index it even if my homepage doesn't link to it. Not the case?
Nope. Google uses links to find pages. A new site needs to be linked to from another site for Google to find it, or you can manually submit a site or page to Google for adding to their index. There's a special page on Google somewhere to do that.
The only way a page that's not linked to may be found is if it uses dynamic URLs where there's something on the querystring to identify the page content to return, such as 'page=1'. Then it may be possible some search engines would use an incrementer to find all possible entries, but I wouldn't rely on it.
Leave a comment:
-
Originally posted by PAH View PostHave a search for tools that locate orphaned web pages/files as a reasonable starting point, assuming you want to identify pages that are still accessible but not via normal link navigation so using a website spidering tool won't work.
Originally posted by TheFaQQer View PostIf the pages aren't public, they what is going to know that they are there?
I always thought if I put up a page mysite.com/some_random_page.html, Google would find it and index it even if my homepage doesn't link to it. Not the case?
Leave a comment:
-
If the pages aren't public, they what is going to know that they are there?
Search engines aren't going to find them, since they aren't anything that you can crawl through.
If you use something that will download the entire site, then it will follow links to find the pages, so that's not going to be any use.
If you own the site, then there are tools you can use to find the orphaned pages, but for a site which you have nothing to do where a directory is secured in any way, then you aren't going to get anything from there.
Leave a comment:
-
If it's not your site and they've got security blocking folder/directory browsing then it doesn't appear to be a simple task.
You could compare older versions of the site via the Wayback Machine.
Have a search for tools that locate orphaned web pages/files as a reasonable starting point, assuming you want to identify pages that are still accessible but not via normal link navigation so using a website spidering tool won't work.
Leave a comment:
-
Originally posted by d000hg View PostAre there any tools which will generate a nice report of pages on a specific site/domain... i.e. finding pages which are publicly accessible but not linked from the main site?
Besides clues Like this, unless directory browsing is enabled with no default page, not sure how you can find pages not linked from the site.
Leave a comment:
-
Any particular reason you felt justified in using LMGTFY when the search phrase contains a technical term?
Those tools are also NOT what I asked for, they seem to work by crawling recursively from the homepage... meaning they'd miss pages that aren't reachable by following links?Last edited by d000hg; 18 January 2012, 08:51.
Leave a comment:
-
Tool to inspect a website structure?
Are there any tools which will generate a nice report of pages on a specific site/domain... i.e. finding pages which are publicly accessible but not linked from the main site?Tags: None
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- Streamline Your Retirement with iSIPP: A Solution for Contractor Pensions Sep 1 09:13
- Making the most of pension lump sums: overview for contractors Sep 1 08:36
- Umbrella company tribunal cases are opening up; are your wages subject to unlawful deductions, too? Aug 31 08:38
- Contractors, relabelling 'labour' as 'services' to appear 'fully contracted out' won't dupe IR35 inspectors Aug 31 08:30
- How often does HMRC check tax returns? Aug 30 08:27
- Work-life balance as an IT contractor: 5 top tips from a tech recruiter Aug 30 08:20
- Autumn Statement 2023 tipped to prioritise mental health, in a boost for UK workplaces Aug 29 08:33
- Final reminder for contractors to respond to the umbrella consultation (closing today) Aug 29 08:09
- Top 5 most in demand cyber security contract roles Aug 25 08:38
- Changes to the right to request flexible working are incoming, but how will contractors be affected? Aug 24 08:25
Leave a comment: