###
__Recursive SQL __

A substantial drawback of MySQL is its inability to use recursive queries. These queries are very useful when you have data stored in a hierarchical structure such that each element in the table also has a parent (except for the root element). A recursive SQL query would allow you to select an element and all of its descendants. This feature comes in handy in situations such as creating a navigation pane for a website. The pages of a website can be organized into a tree like structure. If a developer wanted a user to be able to navigate to any descendant of the current page in the tree representing the website, all descendants could easily be found with a recursive SQL query.

Given the following structure for an SQL database,

CREATE TABLE category (

id int NOT NULL,

parent_id int NOT NULL,

name varchar(20) NOT NULL

)

Here is an example of a recursive SQL query:

WITH children(id, parent_id, name) AS (

SELECT *

FROM category

WHERE category.id = 1

UNION ALL

SELECT ca.id, ca.parent_id, ca.name

FROM children ch, category ca

WHERE ca.parent_id = ch.id

)

SELECT * FROM children;

Let's first walk through this code. Our database already has a table named 'category.' In this table, there exists entries about different categories that our site uses. (For example, on a games site like Game 103, one day, I might want to have subcategories and even sub-sub categories for games. To fetch all subcategories for Arcade games, I would use a recursive query like the one above [If I didn't use MySQL]). Each entry in the category table has its own id and the id of its parent (except for the root category which has no parent).

WITH RECURSIVE children(id, parent_id, name) AS (

The first line of code creates a temporary table called 'children' with the columns id, parent_id, and name (these are the same columns as the category table).

SELECT *

FROM category

WHERE category.id = 1

This is the initial select where we select the root category that we want for our recursive query (note: this does not have to be the root of the whole hierarchical tree, just the category that we want all descendants of in this particular query). In the case above, our root category for this query has an id of 1. After this query has run, our 'children' table has one row with the same information as the row in our 'category' table with an id of 1.

UNION ALL

SELECT ca.id, ca.parent_id, ca.name

FROM children ch, category ca

WHERE ca.parent_id = ch.id

This is where the fun begins. Here we are taking the union of the select statement that we did above with the select statement that we will do below. (For those of you who don't know what a union is, it just combines the two results and throws out duplicates). Our new select statement will select all entries from our current 'children' table joined with the 'category' table where the entry in the 'category' table has a parent_id equal to the id of the entry that is already in the 'children' table. This will select all of the direct children of the single root entry in the 'children' table and put them in the 'children' table as well.

Let's look at how this works in a little more detail. "FROM children ch, category ca" does a join (Cartesian product) of the temporary 'children' table that we are still constructing and category table. This means in our joined table that every row in the 'category' table will be combined with the each row in the 'children' table. Since there is only one row in the 'children' table right now (our root), each row in the 'category' table will just be matched with this row from the 'children' table. In a true Cartesian product, what was just described would be our result. However, the where clause alters our result. The where clause tells us to only include those rows such that have the parent_id of what is in the 'children' table (only the root right now). Thus, we have selected the direct children of the root element, and these entries have been added to the 'children' table.

You may think we are done! However, now we have new elements in our children table, and our recursive query now knows it must run the code

UNION ALL

SELECT ca.id, ca.parent_id, ca.name

FROM children ch, category ca

WHERE ca.parent_id = ch.id

again. However, this time, more results will be included in our output, since we are not just looking for the children of our root, but also the children of our children (since our children table now includes both our root and its direct children). The join will be done again and include all rows that have a parent_id of any of the items in the children table - thus adding the grandchildren of the original root to the table (the direct children will be added again, but ignored since SQL works on set logic and sets ignore duplicates). The next time the code runs, the great-children will be added to the children table. This same process will be repeated (hence the recursion) until there are no more descendants of the original root left to add. Want to see it in action? Here is an SQL Fiddle that will show how this all works.

http://sqlfiddle.com/#!3/68948a/1

###
__Ordering Recursive SQL__

If you ran the code above in the SQL fiddle, you may have noticed that the order that the items are returned in is the order in which they were added to the children table. However, if you are trying to design a navigation system for your website, this order wouldn't be too terribly helpful, as a subcategory for 'games' would be returned after the 'videos' category is returned. This can be fixed through a slightly more complex algorithm.

Here is the SQL code for this algorithm in SQL server:

CREATE VIEW children_no_order(id, parent_id, name, num, rowC) AS

WITH children_simple(id, parent_id, name, num, rowC) AS (

SELECT *,0, (ROW_NUMBER() OVER (ORDER BY category.id))

FROM category

WHERE category.id = 1

UNION ALL

SELECT ca.id, ca.parent_id, ca.name, num+1, (ROW_NUMBER() OVER (ORDER BY ca.id))

FROM children_simple ch, category ca

WHERE ca.parent_id = ch.id

)

SELECT * FROM children_simple;

WITH children(id, parent_id, name, num, orderNum, weight, staticCount) AS (

SELECT *, 0, CONVERT(float,(ROW_NUMBER() OVER (ORDER BY category.id))), POWER(((SELECT max(rowC) FROM children_no_order)+1)/1.0,((SELECT max(num) FROM children_no_order)-1)/1.0), (SELECT max(rowC) from children_no_order)+1

FROM category

WHERE category.id = 1

UNION ALL

SELECT ca.id, ca.parent_id, ca.name, num+1, (orderNum+(ROW_NUMBER() OVER (ORDER BY ca.id)*weight)), weight/staticCount, staticCount

FROM children ch, category ca

WHERE ca.parent_id = ch.id

)

SELECT * FROM children ORDER BY orderNum;

Let's look at this code a little in further detail. The first part of the code is essentially the same as above. We are creating a view of the recursive query we ran earlier and calling this view 'children_no_order.' The real purpose for creating such a view is that this view keeps track of which loop trough the code each entry was added to the table (variable named num which increases on each iteration), and this view keeps track of the order each entry would have been selected (variable named rowC called by the built in SQL SERVER ROW__NUMBER() function). Thinking in terms of a tree, this allows us to pinpoint each entry horizontally (which child it is of its parent) and vertically (what level the entry is on the tree [e.g. child, grandchild, great-grandchild, etc]).

This information is used for the information that we really want which comes in the next query. Let us look at this code.

WITH children(id, parent_id, name, num, orderNum, weight, staticCount) AS (

Our recursive table now has more columns. Like the view we just looked at, num keeps track of the iteration of the loop in which the entry was added to the children table. orderNum is what we use to find the correct order of the entries, weight is what we will multiply the ROW_NUMBER() by in each iteration (this variable weights the value of row_number; we will explain this a little later) staticCount is simply the maximum number of children any entry has in the tree (the width of the tree). This is obtained by finding the maximum rowC from the view we created.

Before we go any further, now is a good time to explain the algorithm we are using to obtain the desired order. For each entry we obtain, we will create an orderNum. This number will be the entry's parent orderNum plus the entry's row_number multiplied by the weight variable. At each increment, we divide the weight variable by staticCount. The initial weight is the width of the tree + 1 obtained by (SELECT max(rowC) FROM children_no_order)+1 to the power of the height of the tree - 1 obtained by (SELECT max(num) FROM children_no_order)-1. Our goal is to convert the order number for each entry to a number base [width of the tree + 1]. We use base width of the tree + 1 to ensure that any multiplication factor will not effect the previous 'digit' (term in the form multiplication factor * [tree_width+1] ^ current weight) in our number that has already been set. The first entries selected (direct children) represent the first 'digit' in our base [width+1] number. So, these will be raised to the highest power (the height of the tree - 1 is the smallest number that will allow for enough 'digits'). We will then divide our weight by staticCount, so in the next iteration we get the next 'digit' needed for out number (this will be added to the number above: direct children of the root will only have one 'digit' entered - the highest 'digit'. Only the deepest elements in the tree will have values for all 'digits'). The multiplication factor we use for each 'digit' is simply the ROW_ORDER that it was obtained in (The maximum number this can be is the width of the tree [this is why we used width of tree + 1 as our base]). Using this algorithm, all the subcategories of a category will come before the next category. This maintains true all the way down our tree, so our order is correct in the end.

For example, consider the following tree representation of data in our table.

Our base will be 3, since the maximum number of children any element has + 1 is 3. Our height is 3 (assuming a tree with just the root is of height 0), so height - 1 = 2. Calculating the weights for each entry we get.

Thus, ordering by orderNum, we will get

Everything

Games

Game 1

Game 2

Videos

Video 1

Video 1 Deleted Scenes

- the correct order. (You may notice that we are actually doing a pre-order traversal of the tree).

Now, let's analyze some of the code to see how this algorithm is implemented.

SELECT *, 0, CONVERT(float,(ROW_NUMBER() OVER (ORDER BY category.id))), POWER(((SELECT max(rowC) FROM children_no_order)+1)/1.0,((SELECT max(num) FROM children_no_order)-1)/1.0), (SELECT max(rowC) from children_no_order)+1

In this line, we are getting all the initial values as described above. Conversion to float and /1.0 are used to provide more space before this initial calculation runs into an arithmetic overflow error.

The only other line different from the simple recursive query is

SELECT ca.id, ca.parent_id, ca.name, num+1, (orderNum+(ROW_NUMBER() OVER (ORDER BY ca.id)*weight)), weight/staticCount, staticCount

Here, num + 1 simply increments the depth on each level. (orderNum+(ROW_NUMBER() OVER (ORDER BY ca.id)*weight)) is what calculates the orderNum for the current entry. We use the previous orderNum (parent's 'digits') plus a new digit we obtain by multiplying the order that this entry was selected by the current weight. We then divide weight by staticCount, so the exponent in the expression that initially began as (width+1)^(height-1) is one less for the next entry, and thus a new digit can be added. We also pass on staticCount, so the child knows what to divide by as well.

The rest of the recursive query is the same as above. However, this time, at the end, we order by order_num with the following line.

SELECT * FROM children ORDER BY orderNum;

Want to play with this query or see it in action? Check out the following SQL Fiddle:

http://sqlfiddle.com/#!6/c4d8c/6

###
__MySQL Alternative: Using PHP to simulate SQL recursion__

Now that you are a recursive SQL expert, we must move away from the topic as it is not supported in MySQL. However, with a little PHP, we can emulate recursion.

Let's first consider the code equivalent of the non-order specific query.

$finalIds = array();

$tempIds = array($id);

while(count($tempIds) > 0) {

$currentId = $tempIds[0];

array_push($finalIds, $currentId);

$IdsQuery = mysql_query("SELECT * FROM category WHERE parent_id = '$currentId'");

while ($rows = mysql_fetch_array($IdsQuery)):

array_push($tempIds, $rows['id']);

endwhile;

array_shift($tempIds);

}

Here we already know the id of our root element for our query, and it is stored in a variable called $id. We have arrays called $tempIds and $finalIds. $tempIds begins with only containing the $id variable while $finalIds is empty. Our algorithm is as follows: while $tempIds still has elements, add the current element to $finalIds then get all the children from the first element ($tempIds[0]) and add them to $tempIds. Then, remove the first element from $tempIds. This algorithm will go through and keep adding children to $tempIds and $finalIds until no elements have any children left to add, and $tempIds is empty. Thus all necessary ids will be in $finalIds.

We can then cast our finalIds array into a format that we can use in an SQL query by using

$sqlFinalIds = "(" . implode(",",$finalIds) . ")";

Now we can run the sqlQuery

$finalQuery = mysql_query("SELECT * FROM categories

WHERE id IN $sqlFinalIds");

Now, one can iterate through $finalQuery to find all necessary rows. However, if we want the same kind of order as the ordered SQL recursion query that is necessary for situations such as navigation, the change to the algorithm is less complicated. What we need to do is have all the children of element considered before the next element. This can be done by editing our original loop slightly.

while(count($tempIds) > 0) {

$currentId = $tempIds[0];

array_push($finalIds, $currentId);

$IdsQuery = mysql_query("SELECT * FROM category WHERE parent_id = '$currentId'");

while ($rows = mysql_fetch_array($IdsQuery)):

array_splice($tempIds, 1, 0, $rows['id']);

endwhile;

array_shift($tempIds);

}

The only line we changed was

array_splice($tempIds, 1, 0, $rows['id']);

This means that an element's children will be placed next in queue to be looked at by our loop, since we have placed the elements child at position 1. It can be noted that this will cause children to be in reverse order of how they were fetched in the SQL query when considered, since the final item in the while loop will be at position 1 (previous terms will be shifted back). If it is necessary for this order to be maintained, a simple change can be made.

while(count($tempIds) > 0) {

$currentId = $tempIds[0];

array_push($finalIds, $currentId);

$IdsQuery = mysql_query("SELECT * FROM category WHERE parent_id = '$currentId'");

$i = 1;

while ($rows = mysql_fetch_array($IdsQuery)):

array_splice($tempIds, $i, 0, $rows['id']);

$i ++;

endwhile;

array_shift($tempIds);

}

Here, instead of always inserting at position 1, we insert at position i. The first child will be inserted into $tempIds at position 1, but the second will be inserted at position 2, 3rd - 3, and so on.

Now $finalIds will contain the ids of the elements in the order we want for getting the navigation to look correct. When we call $finalQuery, we can now add a ORDER BY clause (Note we will also need another variable, this is the same as $sqlFinalIds without the parentheses), so our new line becomes:

$sqlOrderIds = implode(",",$finalIds);

$finalQuery = mysql_query("SELECT * FROM category

WHERE id IN $sqlFinalIds

ORDER BY FIELD(id, $sqlOrderIds)");

Note: FIELD is supported by MySQL, but may not be by other databases.

Then you can iterate through that query with a while($rows = mysql_fetch_array($finalQuery) loop such as

while ($rows = mysql_fetch_array($finalQuery)):

echo $rows['name'];

endwhile;

The only element that this query currently lacks is information about depth for each entry on a tree which would be nice to format a navigation bar nicely.

This can be done fairly easily as well. We can add two more arrays that keep track of the parent level for each element. $tempLevels starts out with 0 as its only value since the root element in our query has a level of 0 (you can change this to 1 if you prefer), and $finalLevels is empty just like $finalIds.

$finalLevels = array();

$tempLevels = array(0);

Updating the main loop we get:

while(count($tempIds) > 0) {

$currentId = $tempIds[0];

array_push($finalIds, $currentId);

array_push($finalLevels, $tempLevels[0]);

$IdsQuery = mysql_query("SELECT * FROM category WHERE parent_id = '$currentId'");

$i = 1;

while ($rows = mysql_fetch_array($IdsQuery)):

array_splice($tempIds, $i, 0, $rows['id']);

array_splice($tempLevels, $i, 0, $tempLevels[0]+1);

$i ++;

endwhile;

array_shift($tempIds);

array_shift($tempLevels);

}

Now $finalLevels will contain the correct level for each element at the same index as the element's index in the $finalIds array. The name and level for a category could be accessed by the following code.

while ($rows = mysql_fetch_array($finalQuery)):

echo $rows['name'];

$key = array_search($rows['id'],$finalIds);

echo $finalLevels[$key];

endwhile;

Here, we find the index of the id we are looking at in $finalIds and then echo the level that is stored in the $finalLevels array under that same index.

Well that was a lot! I Hope it helps you on your project!

Please comment if you find any mistakes!

Thanks and God Bless,

James Grams

## No comments:

Post a Comment