NYCPHP Meetup

NYPHP.org

[nycphp-talk] friendly urls (furls) and the gaps

inforequest 1j0lkq002 at sneakemail.com
Fri Mar 23 03:36:39 EDT 2007


Rick rick-at-napalmriot.com |nyphp dev/internal group use| wrote:

> inforequest wrote:
>
>>
>>
>> Rick rick-at-napalmriot.com |nyphp dev/internal group use| wrote:
>>
>>> It's the webserver that is configured to look for default-index 
>>> files, such as index.html, and not search engines.  Search engines 
>>> only attempt to access valid resources, such as the "fake" resource 
>>> you mentioned (which is quite valid and not fake at all).
>>>
>>> -- 
>>> Rick
>>> http://www.sensual.jp
>>
>>
>>
>> (Top-posting requires top-posting... sorry Michael.)
>>
>> Yes, technically correct -- it is the webserver. BUT, to the 
>> traditional search engine, the URL defines the resource. Every unique 
>> URL is potentially a unique resource, and ideally they are all tested 
>> and included in the index if unique.
>>
>> As webmaster, in the eyes of the indexing search spider, you have 
>> defined your "site" by the URL structure you used to define the 
>> resources, and not by the content (regardless of how that content is 
>> served... by the web server or your PHP scripts). So it becomes 
>> important to control the URL even more carefully than the content in 
>> many cases.
>>
>> This is now changing, as we move away from URL as defining name/label 
>> (ajax, etc). If semantic web were more advanced, it might work, but 
>> for now, it's a good thing we only have one search engine because its 
>> behavior is slowly becoming less standardized and more customized 
>> over time (that was sarcasm.... a little).
>>
>> -=john andrews
>
> John:
>
> You are correct in saying that the URL defines the resource, and the 
> "permanence" (I use that loosely) is quite important really.  The way 
> I translated the question was more or less along the lines of, "say I 
> have this resource, which looks like a folder, is it going to look for 
> an index.html file?"
>
> In the case of my answer, no.  The search engines are not going to try 
> to guess the default resource to go to in the event of something that 
> appears to be a folder.  They merely go where they're told, and they 
> follow (usually) a number of rules along the way.

Recent comments suggest that Google does some figuring for "canonical" 
urls, although it's not clear if they do it on a regular basis or only 
when they have reasons to look closely. They have said they use clues to 
help guide them, including how you set up internal links and how others 
link to your resources.

> Now then, as I understand it (please correct me if I'm wrong, John... 
> as I'm actually curious), if you move a resource to a new location, 
> you should provide the proper headers to do so (I believe 302 for 
> permanently moved, but I don't use it enough to know off the top of my 
> head), than most intelligent search engines are aware of the change.  
> Moving resources should be fairly painless, in that regard.

A 301 is used for a permanent redirect, and a 302 is supposed to be OK 
for an internal redirect now although I still think Google screws up 
interpretation of 302s too often. A 302 is a bad choice for a 
cross-domain redirect, as it has been so widely abused in the past 
search engines hesitate to trust them.



More information about the talk mailing list