NYCPHP Meetup

NYPHP.org

[nycphp-talk] MySQL doubt

Ophir Prusak prusak at gmail.com
Sat Jan 15 17:01:36 EST 2005


thanx for the explanations everyone!

fyi, i did the following two queries which use the same columns but
return a different number of results. Indeed the %30 rule seems to be
the case.

mysql> explain select rev_regdate from review where rev_id > 30000;
+--------+-------+---------------+---------+---------+------+-------+------------+
| table  | type  | possible_keys | key     | key_len | ref  | rows  |
Extra      |
+--------+-------+---------------+---------+---------+------+-------+------------+
| review | range | PRIMARY       | PRIMARY |       4 | NULL | 11032 |
where used |
+--------+-------+---------------+---------+---------+------+-------+------------+
1 row in set (0.04 sec)

mysql> explain select rev_regdate from review where rev_id > 10000; 
+--------+------+---------------+------+---------+------+-------+------------+
| table  | type | possible_keys | key  | key_len | ref  | rows  | Extra      |
+--------+------+---------------+------+---------+------+-------+------------+
| review | ALL  | PRIMARY       | NULL |    NULL | NULL | 43896 | where used |
+--------+------+---------------+------+---------+------+-------+------------+
1 row in set (0.01 sec)


On Sat, 15 Jan 2005 13:49:30 -0500, Mark Horton <mark at nostromo.net> wrote:
> Ophir Prusak wrote:
> > As long as we're on the subject, could someone shed some light on this behavior?
> > It caused me a good amount of hair pulling until I realized what mysql
> > was doing.
> >
> > Why doesn't MySQL use the primary key in the second query?
> > I would think that count doesn't care what column I use, but obviously it does.
> 
> I think what many have said on this is accurate.  I wanted to further
> say that the short answer to your question is that if the optimizer
> detects it will scan more than 30% of the index tree then it may decide
> not to use the index at all and do a full table scan.  (According to the
> recent docs there are other variables taken into account, but I believe
> the 30% rule is a significant factor.)
> 
> You table has 43894 rows and your where clause indicates it will scan
> about 75% of the index tree, thus your second query does a full table scan.
> 
> However, the results of your first query can be wholly computed without
> touching the data at all, so it decides not to do a table scan.  You are
> fetching rev_id and using rev_id in the where clause, so it gets
> everything it needs from the index tree.  (It doesn't always work like
> this, but I believe it does for integer based indexes.)
> 
> Why would it decide to do a full table scan?  My guess is that it
> doesn't take into account that rev_id is an auto_increment column.  So
> it doesn't know that it can simply start at 10001 and work its way up
> using the index.  For all it knows your index could be a normal index
> with duplicates.  I believe this is related to cardinality.  IOW it
> decides it would be faster to only reference the data (1 disk seek)
> instead of the index tree then also fetch the data (possibly 2 disk
> seeks).
> 
> You can hand optimize the second query with 'force index' if you want.
> I've also read that you can use 'max_seeks_for_key' to manipulate some
> of the behavior, although I've never used it myself.
> 
> Mark
> _______________________________________________
> New York PHP Talk
> Supporting AMP Technology (Apache/MySQL/PHP)
> http://lists.nyphp.org/mailman/listinfo/talk
> http://www.nyphp.org
>



More information about the talk mailing list