NYCPHP Meetup

NYPHP.org

[nycphp-talk] MySQL doubt

Mark Horton mark at nostromo.net
Sat Jan 15 13:49:30 EST 2005


Ophir Prusak wrote:
> As long as we're on the subject, could someone shed some light on this behavior?
> It caused me a good amount of hair pulling until I realized what mysql
> was doing.
> 
> Why doesn't MySQL use the primary key in the second query?
> I would think that count doesn't care what column I use, but obviously it does.

I think what many have said on this is accurate.  I wanted to further 
say that the short answer to your question is that if the optimizer 
detects it will scan more than 30% of the index tree then it may decide 
not to use the index at all and do a full table scan.  (According to the 
recent docs there are other variables taken into account, but I believe 
the 30% rule is a significant factor.)

You table has 43894 rows and your where clause indicates it will scan 
about 75% of the index tree, thus your second query does a full table scan.

However, the results of your first query can be wholly computed without 
touching the data at all, so it decides not to do a table scan.  You are 
fetching rev_id and using rev_id in the where clause, so it gets 
everything it needs from the index tree.  (It doesn't always work like 
this, but I believe it does for integer based indexes.)

Why would it decide to do a full table scan?  My guess is that it 
doesn't take into account that rev_id is an auto_increment column.  So 
it doesn't know that it can simply start at 10001 and work its way up 
using the index.  For all it knows your index could be a normal index 
with duplicates.  I believe this is related to cardinality.  IOW it 
decides it would be faster to only reference the data (1 disk seek) 
instead of the index tree then also fetch the data (possibly 2 disk 
seeks).

You can hand optimize the second query with 'force index' if you want. 
I've also read that you can use 'max_seeks_for_key' to manipulate some 
of the behavior, although I've never used it myself.

Mark



More information about the talk mailing list