NYCPHP Meetup

NYPHP.org

[nycphp-talk] friendly urls (furls) and the gaps

Rick rick at napalmriot.com
Thu Mar 22 16:10:13 EDT 2007


It's the webserver that is configured to look for default-index files, 
such as index.html, and not search engines.  Search engines only attempt 
to access valid resources, such as the "fake" resource you mentioned 
(which is quite valid and not fake at all).

--
Rick
http://www.sensual.jp

inforequest wrote:
> Kenneth Downs ken-at-secdat.com |nyphp dev/internal group use| wrote:
>
>> Let's say you use a friendly url (furl) system so that a url looks 
>> like this:
>>
>> www.example.com/furl/parm/value/parm/value
>>
>> Because we are faking a nesting of folders and files here, will a 
>> search bot expect to be able to find:
>>
>> www.example.com/furl/parm/index.html?
>>
>> and
>>
>> www.example.com/furl/parm/value/index.html?
>>
> I have not seen any mention of this by search engines (or their human 
> representatives).
>
> I don't think you are "faking" anything, though. It's a valid web 
> resource, right? Who said it had to represent files and folders?
>
> Lately it seems that they do some poking around when you do this:
>
> www.example.com/furl/parm/value/parm/value
>
> to try and determine the best way to grab that resource (slash or no 
> slash) but the name is just "value".   Google has said it uses your 
> own internal linking styles as clue for your site, and also how others 
> link to you.
>
> A quick check of Google shows this page ranking well:
> http://www.phpwact.org/pattern/model_view_controller
> with  this snippet:
>
>
>    Model View *Controller* [Web Application Component Toolkit]
>    <http://www.phpwact.org/pattern/model_view_controller>
>
> Application *Controller* Controls the flow of logic of a single 
> application. Because the popular MVC framework Java Struts from a 
> *PHP* Perspective implements a *...*
> www.*php*wact.org/pattern/model_view_*controller* - 40k - Cached 
> <http://64.233.167.104/search?q=cache:AU1WIk8nh3MJ:www.phpwact.org/pattern/model_view_controller+php+controller&hl=en&ct=clnk&cd=9&gl=us> 
> - Similar pages 
> </search?hl=en&q=related:www.phpwact.org/pattern/model_view_controller>
>
>
> A hit to a trailing-slash-added version  gives a 200ok but emptyish 
> template page, BUT it is in the Google index with this snippet:
>
>
>    Model View Controller [Web Application Component Toolkit]
>    <http://www.phpwact.org/pattern/model_view_controller>
>
> You are here: Web Application Component Toolkit » pattern » Model View 
> Controller. Table of Contents. Model View Controller. Model. Passive 
> Model *...*
> www.phpwact.org/pattern/model_view_controller - 40k - Cached 
> <http://64.233.167.104/search?q=cache:AU1WIk8nh3MJ:www.phpwact.org/pattern/model_view_controller+http://www.phpwact.org/pattern/model_view_controller/&hl=en&ct=clnk&cd=1&gl=us> 
> - Similar pages 
> </search?hl=en&q=related:www.phpwact.org/pattern/model_view_controller>
>
>
> Notice that Google lists that trailing slash page with a URL that has 
> no trailing slash. That looks like a double listing of the 
> no-trailing-slash URL, with two snippets.
>
> The contrived resource (with index.html) does the same (200ok but 
> empty template page): 
> http://www.phpwact.org/pattern/model_view_controller/index.php
>
> A search of Google for that contrived page 
> http://www.phpwact.org/pattern/model_view_controller/index.php shows 
> "no such page".
>
> So? Google says slash and no trailing slash are the same resource as 
> no trailing slash, but represents the two as different content in the 
> index (which they are). That suggests that Google is confused about 
> that resource. A scan of the rest of that results set shows some URLs 
> have trailing slashes, some do not : 
> http://www.google.com/search?q=php+controller&hl=en&start=10&sa=N
>
> What happens if I put a page on my site and 302 redirect to those 
> pages? Will Google take their content and index it as belonging to my 
> site? What if I put content at that empty template page, which is 
> diferent content than the no-trailing-slash URL. Will Google still 
> list both pages as going to the no-trailing-slash URL? If one page is 
> on red widgets and the other on reen baboons, will it effect relevance 
> ranking of either page?  That's for SEO homework.
>
> -=john
>




More information about the talk mailing list