NYCPHP Meetup

NYPHP.org

[nycphp-talk] PHP And Search Engines

inforequest 1j0lkq002 at sneakemail.com
Wed Sep 29 22:58:36 EDT 2004


Daniel Convissor danielc-at-analysisandsolutions.com |nyphp dev/internal 
group use| wrote:

>On Wed, Sep 29, 2004 at 04:22:12PM -0400, Joseph Crawford wrote:
>  
>
>>if there is an easy way without use
>>apache's mod_rewrite to make a search engine index pages that have
>>querystrings such as
>>product.php?id=57 
>>    
>>
>
>It seems you're in search of an answer to a problem that doesn't 
>exist.  I know Google searches pages with query strings.  I'd guess 
>most of the others do as well.
>
>--Dan
>

There are some very specific and good reasons to eliminate url's with 
query strings. Getting spidered and indexed just isn't one of them :-)

That is why I suggested you have clear objectives. You don't simply want 
to get rid of query string URLs in order to get indexed. You may however 
want to get rid of complex, low-readability URLs in order to better 
communicate your content message to the masses (visitors as well as 
spiders).

As Chris Shiflett highlighted already,  
www.yoursite.com?id=456&prod_id=564  is not nearly use "useful" to 
anyone as something like  www.yoursite.com/cars/hover_cars/index.html
The search engines will reward you for improved usability like that (not 
just index you, but rank you higher for keywords associated with cars, 
hover cars, and your site's known themes). In addition you are more 
likely to get bookmarked, and your bookmarks will make better sense in 
your traffic logs.

Additionally, Search Engines may or may not consider 
www.yoursite.com?id=456&prod_id=564  and  
www.yoursite.com?id=322&prod_id=567 as different pages. This is an 
emerging research area, but it seems that Google is considering them to 
be 2 variants of the same page. At first this may not be important as 
long as they both appear in the index. However, since the index is a 
finite resource and the SEs love to keep it *all* in the shared memory 
of their server clusters at all times (for speed), it is inevitable that 
SEs will have to prioritize on inclusion of pages. It is reasonable to 
think variants of the same page will be demoted early in that process. 
It is also reasonable to think this is done already, at several levels. 
There are PageRank issues associated with this as well, even less 
understood. Again much of this stuff is on the edge.... issues like 
semantic URLs being better than cryptic query strings - that is much 
more concrete.

There are empirical data (based solely on observations) that suggest a 
site structure with one level is prefered over one with deep 
directories. That means a site with www.site.com/small_cars.html  and 
www.site.com/large_cars.html is preferred over one with 
www.site.com/cars/small/index.html and www.site.com/cars/large/index.html.

Similarly there is large body of empirical knowledge on search engine 
preferences, for each engine and certain conditions. They are all 
important in optimizaton and when competing, but many are also very 
important just for search engine friendliness.

For example it is pretty clear that www.site.com/index.html is "better" 
than www.site.com/index or www.site.com/index/. The SE bots are seeking 
out *pages* (called documents) and love plain and simple HTML. In 
general the more you look simple clean (and valid!) HTML the better they 
will treat you. It is especially important when you are trying to get 
the spider to flag your site for deep crawling. It appears "difficult" 
sites are defered for deep crawling compared to "easy" sites, all other 
things being equal. Some people see a deep crawl and others see frequent 
visits for small batches of pages.... there is an algorithm at work there.

By the way Flash is indeed spidered and indexed. You have to make it 
easy for the bots semantically. Take a look at a Google search for .swf  
and you'll see many sites are listed and indexed with good snippets, 
while others are only listed, and some are indexed with "poor" snippets. 
There are even some advantages to be gained from using Flash.... I could 
tell you some tricks, but then I'd have to.... ;-)

The proposed NYPHP talk in January is alot of this kind of stuff, 
specifically tailored to the PHP programmer and overall technical 
webmaster of a site. It is my perspective that with a certain level of 
awareness of the issues, the PHP programmer/site designer can take 
certain steps early on which make it much easier for search engine 
specific improvements to be made later. In my work I have to break the 
bad news to site programmers and designers after the site has been built 
-- that it needs an overhaul or needs to be rebuilt. I would prefer to 
be more popular than I am ;-)  Alot of that pain would go away if 
certain things were considered at design and construction time.

-=john andrews

















More information about the talk mailing list