So now we have a two tables, one full of category names and ids and one which relates those ids to left and right values which supposedly match up to positions on a tree. This is generally the part where most people start to cry about difficult math and SQL queries... but in Celko we trust my children so let's go over the queries we'll be using.
Now, before I get started, I would like to take a moment to pitch Celko's book, "Trees and Hierarchies In SQL for Smarties" again. Those who have read the book may point outy that Celko mentions MySQL as a poor choice of database. This was true when he wrote it. However, everything he wrote can now be entered directly into MySQL 5. Most of the queries below are slightly modified versions of queries in that book with the goal of applying them to a category management system. Problem is that there is a wealth of information in there that has nothing to do with the topic of these articles - so go pick it up and read it.
No really... it's the same information
Normalization allows you to eliminate repetitive information and create wickedly complicated table structures that take extremely complicated queries to read in. This was a problem before MySQL 5 - but now we have views, The first thing we need to do is set up a quick view so that we can see tree with the category names for each node.
CREATE VIEW site AS (
SELECT * from tree JOIN category USING (cate_id) ORDER BY tree_left
);
I ordered the results by tree_left to make life easier while debugging. Now you'll be able to see the order your worm will crawl as it moves through the table. Incidentally it might be worth your time to check your table to make sure the left and right values are correct before continuing. Go ahead and select * on your view and make sure your left and right values match up the way they should. If you were using my test data, you should see something like this:
+---------+---------+-----------+------------+---------------+
| cate_id | tree_id | tree_left | tree_right | cate_name |
+---------+---------+-----------+------------+---------------+
| 1 | 2 | 1 | 54 | Home |
| 2 | 1 | 2 | 35 | North America |
| 3 | 3 | 3 | 10 | Mammal |
| 4 | 5 | 4 | 5 | Fox |
| 5 | 6 | 6 | 7 | Deer |
| 6 | 7 | 8 | 9 | Buffalo |
| 7 | 8 | 11 | 22 | Fish |
| 8 | 10 | 12 | 17 | Salt Water |
| 9 | 12 | 13 | 14 | Sea Bass |
| 10 | 13 | 15 | 16 | Shark |
| 11 | 15 | 18 | 21 | Fresh Water |
| 12 | 17 | 19 | 20 | Trout |
| 13 | 18 | 23 | 34 | Insect |
| 14 | 20 | 24 | 33 | Winged |
| 15 | 21 | 25 | 30 | Bee |
| 16 | 22 | 26 | 27 | Honey |
| 17 | 23 | 28 | 29 | Killer |
| 18 | 24 | 31 | 32 | Fly |
| 20 | 26 | 36 | 53 | Africa |
| 3 | 4 | 37 | 40 | Mammal |
| 19 | 25 | 38 | 39 | Big Five |
| 7 | 9 | 41 | 50 | Fish |
| 8 | 11 | 42 | 47 | Salt Water |
| 21 | 27 | 43 | 44 | Clown Fish |
| 10 | 14 | 45 | 46 | Shark |
| 11 | 16 | 48 | 49 | Fresh Water |
| 13 | 19 | 51 | 52 | Insect |
+---------+---------+-----------+------------+---------------+
27 rows in set (0.05 sec)
Right off the bat now we have the ability to run numerous queries which will allow us easy access to the information contained in our tree. Every time we add a child category to a parent category, the right value will increase by 2.. As such the difference between the left and right value of a given node will tell us how many children there are. We can also find all the child nodes in the table by finding all the nodes where the left value is one less than the right.
SELECT tree_id, cate_id, cate_name FROM site WHERE tree_left = (tree_right - 1);
+---------+---------+-------------+
| tree_id | cate_id | cate_name |
+---------+---------+-------------+
| 5 | 4 | Fox |
| 6 | 5 | Deer |
| 7 | 6 | Buffalo |
| 12 | 9 | Sea Bass |
| 13 | 10 | Shark |
| 17 | 12 | Trout |
| 22 | 16 | Honey |
| 23 | 17 | Killer |
| 24 | 18 | Fly |
| 25 | 19 | Big Five |
| 27 | 21 | Clown Fish |
| 14 | 10 | Shark |
| 16 | 11 | Fresh Water |
| 19 | 13 | Insect |
+---------+---------+-------------+
14 rows in set (0.04 sec)
This gives us the ability to find all the nodes underneath a parent very quickly by finding nodes with a left value higher than, and a right value lower than a given node. In our case, this gives us two options - either searching by a single position in the tree (by tree_id):
SELECT c.tree_id as ctree_id, c.cate_id AS ccate_id, c.cate_name AS ccate_name
FROM site AS p, site as c
WHERE c.tree_left > p.tree_left
AND c.tree_right < p.tree_right
AND p.tree_id = '9'
ORDER BY p.tree_left, c.tree_left;
+----------+----------+-------------+
| ctree_id | ccate_id | ccate_name |
+----------+----------+-------------+
| 11 | 8 | Salt Water |
| 27 | 21 | Clown Fish |
| 14 | 10 | Shark |
| 16 | 11 | Fresh Water |
+----------+----------+-------------+
4 rows in set (0.07 sec)
or by all the appearances of a category in the tree (by cate_id). In this case we need to add a group by cate_id because if the parent category appears more than once in the tree, then the child category may as well.
SELECT c.tree_id as ctree_id, c.cate_id AS ccate_id, c.cate_name AS ccate_name
FROM site AS p, site as c
WHERE c.tree_left > p.tree_left
AND c.tree_right < p.tree_right
AND p.cate_id = '7'
GROUP BY c.cate_id
ORDER BY p.tree_left, c.tree_left;
+----------+----------+-------------+
| ctree_id | ccate_id | ccate_name |
+----------+----------+-------------+
| 10 | 8 | Salt Water |
| 12 | 9 | Sea Bass |
| 13 | 10 | Shark |
| 15 | 11 | Fresh Water |
| 17 | 12 | Trout |
| 27 | 21 | Clown Fish |
+----------+----------+-------------+
6 rows in set (0.02 sec)
Thinking inside the box
Each node in the tree is now acting like a box. The wider the box (difference between left and right values) the more little boxes it can contain. The interesting thing about this is that the left values of the table will always fall between the left and right values of the parent nodes. As such, we can also quickly find positional information for each node in the tree.
One problem with using the adjacency list method for a hierarchy is it is very expensive to find out how high on the tree a given node is. Without that knowledge, you often have to set a limit to how deep your pages go. The box property of MPTT makes this trivial. All we need is to search for all nodes where left value of your given node is between the left and right values. The following query presumes home should be the zero level (useful if you're using the result of this in an array.)
SELECT thisId.tree_id, thisId.cate_id, thisId.cate_name, (COUNT(quan.tree_id) - 1) AS tree_level
FROM site AS quan, site AS thisId
WHERE thisId.tree_left BETWEEN quan.tree_left AND quan.tree_right
GROUP BY thisId.tree_id;
+---------+---------+---------------+------------+
| tree_id | cate_id | cate_name | tree_level |
+---------+---------+---------------+------------+
| 1 | 2 | North America | 1 |
| 2 | 1 | Home | 0 |
| 3 | 3 | Mammal | 2 |
| 4 | 3 | Mammal | 2 |
| 5 | 4 | Fox | 3 |
| 6 | 5 | Deer | 3 |
| 7 | 6 | Buffalo | 3 |
| 8 | 7 | Fish | 2 |
| 9 | 7 | Fish | 2 |
| 10 | 8 | Salt Water | 3 |
| 11 | 8 | Salt Water | 3 |
| 12 | 9 | Sea Bass | 4 |
| 13 | 10 | Shark | 4 |
| 14 | 10 | Shark | 4 |
| 15 | 11 | Fresh Water | 3 |
| 16 | 11 | Fresh Water | 3 |
| 17 | 12 | Trout | 4 |
| 18 | 13 | Insect | 2 |
| 19 | 13 | Insect | 2 |
| 20 | 14 | Winged | 3 |
| 21 | 15 | Bee | 4 |
| 22 | 16 | Honey | 5 |
| 23 | 17 | Killer | 5 |
| 24 | 18 | Fly | 4 |
| 25 | 19 | Big Five | 3 |
| 26 | 20 | Africa | 1 |
| 27 | 21 | Clown Fish | 4 |
+---------+---------+---------------+------------+
27 rows in set (0.02 sec)
This query can also give us the height of the tree.
SELECT MAX(tree_level) AS tree_height
FROM (SELECT thisId.tree_id, thisId.cate_id, thisId.cate_name, (COUNT(quan.tree_id) - 1) AS tree_level
FROM site AS quan, site AS thisId
WHERE thisId.tree_left BETWEEN quan.tree_left AND quan.tree_right
GROUP BY thisId.tree_id) AS mcelaney;
+-------------+
| tree_height |
+-------------+
| 6 |
+-------------+
1 row in set (0.01 sec)
Now one of the first things I showed you was that we could find the children of a node... but the box model can be used to find the immediate children. The query here checks to see if the child node is in the sub query - if so then it skips it. To get the results for an individual node, add WHERE tree_id = 'x'. Keep in mind that we could also run this query against the cate_id to get the direct children of that category across the entire tree.
SELECT parent.tree_id, parent.cate_name, parent.tree_left,
child.tree_id, child.cate_name, child.tree_left, child.tree_right
FROM site AS parent, site AS child
WHERE child.tree_left BETWEEN parent.tree_left AND parent.tree_right
AND child.tree_id != parent.tree_id
AND parent.tree_id = '1'
AND NOT EXISTS (
SELECT *
FROM site AS middle
WHERE middle.tree_left BETWEEN parent.tree_left AND parent.tree_right
AND child.tree_left BETWEEN middle.tree_left AND middle.tree_right
AND middle.tree_id NOT IN(child.tree_id, parent.tree_id))
ORDER BY parent.tree_left;
+---------+---------------+---------+-----------+-----------+------------+
| tree_id | cate_name | tree_id | cate_name | tree_left | tree_right |
+---------+---------------+---------+-----------+-----------+------------+
| 1 | North America | 3 | Mammal | 3 | 10 |
| 1 | North America | 8 | Fish | 11 | 22 |
| 1 | North America | 18 | Insect | 23 | 34 |
+---------+---------------+---------+-----------+-----------+------------+
3 rows in set (0.03 sec)
If we turn this query into a view (hereafter referred to as direct_child... don't forget to pull out the AND parent.tree_id = '1') we can run a simply query against it to gain the direct parent of any node (or parents of any category).
SELECT ptree_id, pcate_id, pcate_name, MIN(tree_left) - 1, MAX(tree_right) + 1
FROM direct_child
WHERE tree_id = '5'
GROUP BY ptree_id;
So we can get the children, the direct children and the parents - and can create views to accomplish all these feats. As it turns out the hard one is to get all the siblings of a given node. We need a view to give the level of each node in the tree. Should look like the last except we're not going to subtract one from the level:
CREATE VIEW siblings_level AS
SELECT COUNT(A2.tree_left) AS lvl, A1.tree_id, A1.cate_id, A1.cate_name, A1.tree_left, A1.tree_right
FROM site AS A1, site AS A2
WHERE A1.tree_left BETWEEN A2.tree_left AND A2.tree_right
GROUP BY A1.cate_name, A1.tree_left, A1.tree_right;
Then we can use the following query to snag the siblings:
SELECT DISTINCT S2.cate_name
FROM siblings_level AS S1, siblings_level AS S2
WHERE S1.tree_id = '8'
AND EXISTS
(SELECT *
FROM siblings_level AS S0
WHERE S1.tree_left BETWEEN S0.tree_left AND S0.tree_right
AND S2.tree_left BETWEEN S0.tree_left AND S0.tree_right
AND S0.lvl = S1.lvl - 1
AND S1.lvl = S2.lvl);
+-----------+
| cate_name |
+-----------+
| Fish |
| Insect |
| Mammal |
+-----------+
3 rows in set (0.04 sec)
Last query we have seems twisted, but it solves one of the most common recursive operations on websites - the breadcrumbs.
SELECT A2.tree_id, A2.cate_name,
(SELECT COUNT(*)
FROM site AS A4
WHERE A4.tree_left BETWEEN A1.tree_left and A1.tree_right
AND A2.tree_left BETWEEN A4.tree_left AND A4.tree_right) AS path_nbr
FROM site AS A1, site AS A2, site AS A3
WHERE A1.tree_id = '2' /*starting node*/
AND A3.tree_id = '23' /*ending node*/
AND A2.tree_left BETWEEN A1.tree_left AND A1.tree_right
AND A3.tree_left BETWEEN A2.tree_left AND A2.tree_right;
+---------+---------------+----------+
| tree_id | cate_name | path_nbr |
+---------+---------------+----------+
| 2 | Home | 1 |
| 1 | North America | 2 |
| 18 | Insect | 3 |
| 20 | Winged | 4 |
| 21 | Bee | 5 |
| 23 | Killer | 6 |
+---------+---------------+----------+
6 rows in set (0.10 sec)
Take a breather...
Granted, I know a lot of these queries need some work. Renaming results, ensuring tree and category ids make it into views, etc. Keep in mind that before this is all over many of these will be set into views, functions and stored procedures. That said, we've seen a lot structures which could give us some interesting properties. Since our queries can be performed on the category_id or the tree_id, our ability to find groupings of products would be extremely flexible.
Next up is the part that makes most people flinch... how to copy, insert and delete - then the real fun begins.