[nycphp-talk] MySQL doubt
mark at nostromo.net
Sat Jan 15 13:49:30 EST 2005
Ophir Prusak wrote:
> As long as we're on the subject, could someone shed some light on this behavior?
> It caused me a good amount of hair pulling until I realized what mysql
> was doing.
> Why doesn't MySQL use the primary key in the second query?
> I would think that count doesn't care what column I use, but obviously it does.
I think what many have said on this is accurate. I wanted to further
say that the short answer to your question is that if the optimizer
detects it will scan more than 30% of the index tree then it may decide
not to use the index at all and do a full table scan. (According to the
recent docs there are other variables taken into account, but I believe
the 30% rule is a significant factor.)
You table has 43894 rows and your where clause indicates it will scan
about 75% of the index tree, thus your second query does a full table scan.
However, the results of your first query can be wholly computed without
touching the data at all, so it decides not to do a table scan. You are
fetching rev_id and using rev_id in the where clause, so it gets
everything it needs from the index tree. (It doesn't always work like
this, but I believe it does for integer based indexes.)
Why would it decide to do a full table scan? My guess is that it
doesn't take into account that rev_id is an auto_increment column. So
it doesn't know that it can simply start at 10001 and work its way up
using the index. For all it knows your index could be a normal index
with duplicates. I believe this is related to cardinality. IOW it
decides it would be faster to only reference the data (1 disk seek)
instead of the index tree then also fetch the data (possibly 2 disk
You can hand optimize the second query with 'force index' if you want.
I've also read that you can use 'max_seeks_for_key' to manipulate some
of the behavior, although I've never used it myself.
More information about the talk