NYCPHP Meetup

NYPHP.org

[nycphp-talk] SQL question

Rob Marscher rmarscher at beaffinitive.com
Mon Jul 23 23:08:11 EDT 2007


On Jul 23, 2007, at 3:45 PM, David Krings wrote:
> So, I guess now that I got this settled I better learn what a left  
> join is in case someone asks me to explain my code. :)

LEFT JOIN actually is short for LEFT OUTER JOIN.  OUTER JOIN means  
that it doesn't require finding a match... if it doesn't, it returns  
null.  LEFT OUTER JOIN means that the table on the left side of the  
join tries to join to the table on the right.

So... for an example, let's say a "user" table has an id and a  
"session" table also holds a reference to the user id.

SELECT u.id, s.id FROM user u LEFT JOIN session s ON u.id = s.user_id  
WHERE u.id = 5

If user id = 5 exists in the user table but there aren't any records  
with user_id = 5 in the session table, the result would be:
u.id		s.id
5		null

If you did an INNER JOIN:
SELECT u.id, s.id FROM user u INNER JOIN session s ON u.id =  
s.user_id WHERE u.id = 5

This would return an empty set because the INNER JOIN requires that  
it finds rows in both tables.  So in your case, I actually think you  
could have done INNER JOINs - but since it's not broken, no need to  
fix, right :)

When you right a query like this:
SELECT u.id, s.id FROM user u, session s WHERE u.id = 5 AND u.id =  
s.user_id

You are doing a cartesian join - the commas imply the join.  Usually,  
this works pretty much the same as an inner join but the difference  
between a cartesian join and an inner join it that the database tries  
to figure out all of the ways that the tables can be joined -- rather  
than just using what you specified in your ON clause.  If you go back  
to your original query, you're joining everything off of the  
cupssbmain tables (based on the links to the CourseID in your where  
clause).  The database tries to figure out all the different ways to  
connect the tables based off this and this is how it wound up with  
the 1248 rows.  Also, the reason the SELECT DISTINCT worked when you  
were just getting the sbid was that it was going through all of those  
rows and eliminating the ones that returned the same sbid.  This is  
why people say that DISTINCT is bad -- because it's a lot of extra  
work for the database to get this huge number of rows and then go and  
figure out which results aren't duplicates.  If you take out the  
DISTINCT part of the query, you can see how many rows are actually  
getting returned and see what the database has to search through.   
When you added the position from the other tables, it then returned  
1248 because that was all the different combinations of sbid, module,  
lesson, and page positions.

Going back to that original query, you can write it like this:

SELECT cupssbmain.StoryboardID AS sbid
FROM cupsmodules,  cupslessons, cupspages, cupssbmain
WHERE cupsmodules.CourseID = 23
   AND cupslessons.ModuleID = cupsmodules.ModuleID
   AND cupspages.LessonID = cupslessons.LessonID
   AND cupssbmain.PageID = cupspages.PageID
ORDER BY cupsmodules.Module_Position ASC,
   cupslessons.Lesson_Position ASC,
   cupspages.Page_Position ASC

I think that should give you the right result.  So... it is possible  
without using INNER JOIN or LEFT JOIN.  We had a discussion last year  
on NYPHP-MySQL about INNER JOIN versus commas (cartesian join) -- you  
so often see it with commas in tutorials and other people's code.   
But the concensus we came to was that it's best to be as specific as  
possible... so I've always used INNER JOINs instead of commas ever  
since.

Later,
Rob





More information about the talk mailing list