Much thought about URLs tonight...
Assuming the http://testpleasedelete.com at the start, one might create such wonders as:
/consecutive/posts/by/AtW/in/2007/12
and
/AtW/consecutive/posts/in/2007/12
and
/posts/by/AtW/consecutively/during/2007/12
all of which would return the same data, but give a "natural language" feel to the URLs, and be trivial to implement.
However, the search engines can get a bit grumpy over the same content being available at different URLs, although it doesn't contravene any standards.
So they created various techniques whereby one can let their spiders know that "x/y/z" is the canonical URL, whereas "stuff/x/having/y/being/z", although it may point to the same resource (and remember, all they can see is whatever is returned when they do an HTTP GET - they can't divine your intentions) should be treated as if it was "x/y/z", and the search results page (SRP) should point to "x/y/z", and not the other.
This gives us two wins (to my knowledge - there may be other wins I'm unaware of, or don't remember at this time of night): our sites aren't penalised in search engine rankings for showing the same content at multiple URLs, which is a classic search engine spamming tactic; and we know that we can extend our interfaces (interfaces being the URLs we provide content for) without breaking existing highly-ranked linkage, whereby we might lose our existing search engine ranking for valid content which we merely choose to make available by an additional route.
So the question then becomes: which URL schema do we define as the primary URL?
I reckon the username-based one (e.g. /AtW/consecutive/posts/in/2007/12/31/23:00) should be canonical, but let me know what you think
(Those smilies are playing musical chairs again... took me ages to find just then )
Assuming the http://testpleasedelete.com at the start, one might create such wonders as:
/consecutive/posts/by/AtW/in/2007/12
and
/AtW/consecutive/posts/in/2007/12
and
/posts/by/AtW/consecutively/during/2007/12
all of which would return the same data, but give a "natural language" feel to the URLs, and be trivial to implement.
However, the search engines can get a bit grumpy over the same content being available at different URLs, although it doesn't contravene any standards.
So they created various techniques whereby one can let their spiders know that "x/y/z" is the canonical URL, whereas "stuff/x/having/y/being/z", although it may point to the same resource (and remember, all they can see is whatever is returned when they do an HTTP GET - they can't divine your intentions) should be treated as if it was "x/y/z", and the search results page (SRP) should point to "x/y/z", and not the other.
This gives us two wins (to my knowledge - there may be other wins I'm unaware of, or don't remember at this time of night): our sites aren't penalised in search engine rankings for showing the same content at multiple URLs, which is a classic search engine spamming tactic; and we know that we can extend our interfaces (interfaces being the URLs we provide content for) without breaking existing highly-ranked linkage, whereby we might lose our existing search engine ranking for valid content which we merely choose to make available by an additional route.
So the question then becomes: which URL schema do we define as the primary URL?
I reckon the username-based one (e.g. /AtW/consecutive/posts/in/2007/12/31/23:00) should be canonical, but let me know what you think
(Those smilies are playing musical chairs again... took me ages to find just then )
Comment