Complex Properties

In a relational database, one table will often refer to another. Likewise, some of your business objects may include another object or list of objects. Types that nest other types are called "complex types". You may not want a statement to return a simple type, but a fully-formed complex type.

In the database, a related column is usually represented via a 1:1 relationship, or a 1:M relationship where the class that holds the complex property is from the "many side" of the relationship and the property itself is from the "one side" of the relationship. The column returned from the database will not be the property we want; it is a key to be used in another query.

From the framework's perspective, the problem is not so much loading a complex type, but loading each "complex property". To solve this problem, you can specify in the Result Map a statement to run to load a given property. In the following example, the "category" property of the "select-product-result" element is a complex property.

<resultMap id="select-product-result" class="product">
    <result property="id" column="PRD_ID"/>
    <result property="description" column="PRD_DESCRIPTION"/>
    <result property="category" column="PRD_CAT_ID" select="selectCategory"/>
</resultMap>

<resultMap id="select-category-result" class="category">
    <result property="id" column="CAT_ID"/>
    <result property="description" column="CAT_DESCRIPTION"/>
</resultMap>

<select id="selectProduct" parameterClass="int" resultMap="select-product-result">
   select * from PRODUCT where PRD_ID = #value#
</select>

<select id="selectCategory" parameterClass="int" resultMap="select-category-result">
   select * from CATEGORY where CAT_ID = #value#
</select>

In the above example, the framework will use the "selectCategory" statement to populate the "category" property. The value of each category is passed to the "selectCategory" statement, and the object returned is set to the category property. When the process completes, each Product instance will have the the appropriate category object instance set.

Avoiding N+1 Selects (1:1)

A problem with the above example may be that whenever you load a Product, two statements execute: one for the Product and one for the Category. For a single Product, this issue may seem trivial. But if you load 10 products, then 11 statements execute. For 100 Products, instead of one statement product statement executing, a total of 101 statements execute. The number of statements executing for the above example will always be N+1: 100+1=101.

One way to mitigate the problem is to cache the "selectCategory" statement. We might have a hundred products, but there might only be five categories. Instead of running a SQL query or stored procedure, the framework will return the category object from it cache. A 101 statements would still run, but they would not be hitting the database. See Cache Models more details about caches.

Another solution is to use a standard SQL join to return the columns you need from the another table. A join can bring all the columns we need over from the database in a single query. When you have a nested object, you can reference nested properties using a dotted notation, like "category.description".

The following example solves the same problem as the previous example, but uses a join instead of nested properties.

<resultMap id="select-product-result" class="product">
    <result property="id" column="PRD_ID"/>
    <result property="description" column="PRD_DESCRIPTION"/>
    <result property="category" resultMapping="Category.CategoryResult" />
</resultMap>

<statement id="selectProduct" parameterClass="int" resultMap="select-product-result">
    select *
    from PRODUCT, CATEGORY
    where PRD_CAT_ID=CAT_ID
    and PRD_ID = #value#
</statement>
Lazy Loading vs. Joins (1:1):

It's important to note that using a join is not always better. If you are in a situation where it is rare to access the related object (e.g. the category property of the Product class) then it might actually be faster to avoid the join and the unnecessary loading of all category properties. This is especially true for database designs that involve outer joins or nullable and/or non-indexed columns. In these situations it might be better to use the sub-select solution with lazy loading enabled. The general rule of thumb is: use the join if you're more likely going to access the associated properties than not. Otherwise, only use it if lazy loading is not an option.

If you're having trouble deciding which way to go, don't worry. No matter which way you go, you can always change it without impacting your application source code. The two examples above result in exactly the same object graph and are loaded using the exact same method call from the application. The only consideration is that if you were to enable caching, then the using the separate select (not the join) solution could result in a cached instance being returned. But more often than not, that won't cause a problem (your application shouldn't be dependent on instance level equality i.e. "===").

Complex Collection Properties

It is also possible to load properties that represent lists of complex objects. In the database the data would be represented by a M:M relationship, or a 1:M relationship where the class containing the list is on the "one side" of the relationship and the objects in the list are on the "many side"". To load a TList of objects, there is no change to the statement (see example above). The only difference required to cause the SQLMap DataMapper framework to load the property as a TList is that the property on the business object must be of type TList. For example, if a Category has a TList of Product instances, the mapping would look like this (assuming Category has a property called "ProductList" of TList.):

<resultMap id="select-category-result" class="Category">
    <result property="Id" column="CAT_ID"/>
    <result property="Description" column="CAT_DESCRIPTION"/>
    <result property="ProductList" column="CAT_ID" select="selectProductsByCatId"/>
</resultMap>

<resultMap id="select-product-result" class="Product">
    <result property="Id" column="PRD_ID"/>
    <result property="Description" column="PRD_DESCRIPTION"/>
</resultMap>

<statement id="selectCategory" parameterClass="int"
            resultMap="select-category-result">
    select * from CATEGORY where CAT_ID = #value#
</statement>

<statement id="selectProductsByCatId" parameterClass="int"
          resultMap="select-product-result">
   select * from PRODUCT where PRD_CAT_ID = #value#
</statement>

Avoiding N+1 Select Lists (1:M and M:N)

This is similar to the 1:1 situation above, but is of even greater concern due to the potentially large amount of data involved. The problem with the solution above is that whenever you load a Category, two SQL statements are actually being run (one for the Category and one for the list of associated Products). This problem seems trivial when loading a single Category, but if you were to run a query that loaded ten (10) Categories, a separate query would be run for each Category to load its associated list of Products. This results in eleven (11) queries total: one for the list of Categories and one for each Category returned to load each related list of Products (N+1 or in this case 10+1=11). To make this situation worse, we're dealing with potentially large lists of data.

<resultMap id="select-category-result" class="Category">
    <result property="Id" column="CAT_ID"/>
    <result property="Description" column="CAT_DESCRIPTION"/>
    <result property="ProductList" column="CAT_ID" select="selectProductsByCatId"/>
</resultMap>

<resultMap id="select-product-result" class="Product">
    <result property="Id" column="PRD_ID"/>
    <result property="Description" column="PRD_DESCRIPTION"/>
</resultMap>

<!-- This statement executes 1 time -->
<statement id="selectCategory" parameterClass="int"
            resultMap="select-category-result">
    select * from CATEGORY where CAT_ID = #value#
</statement>

<!-- This statement executes N times (once for each category returned above)
     and returns a list of Products (1:M) -->
<statement id="selectProductsByCatId" parameterClass="int"
            resultMap="select-product-result">
    select * from PRODUCT where PRD_CAT_ID = #value#
</statement>

1:N and M:N Solution?

One way to avoid multiple SQL queries is to use Joins in the query together with the groupBy attribute in <resultMap> and the resultMapping attribute of <result>. The following example selects all the categories and the corresponding products using 1 query.

<resultMap id="select-category-result" class="Category" groupBy="CAT_ID">
    <result property="Id" column="CAT_ID"/>
    <result property="Description" column="CAT_DESCRIPTION"/>
    <result property="ProductList" resultMapping="select-product-result"/>
</resultMap>
 
<resultMap id="select-product-result" class="Product">
    <result property="Id" column="PRD_ID"/>
    <result property="Description" column="PRD_DESCRIPTION"/>
</resultMap>
 
<!-- This statement executes 1 time -->
<statement id="selectCategory" parameterClass="int"
            resultMap="select-category-result">
    select 
            CATEGORY.CAT_ID as CAT_ID,
            CATEGORY.CAT_DESCRIPTION as CAT_DESCRIPTION,
            PRODUCT.PRD_ID as PRD_ID,
            PRODUCT.PRD_DESCRIPTION as PRD_DESCRIPTION
     from CATEGORY 
        left join PRODUCT on
            PRODUCT.PRD_CAT_ID = CATEGORY.CAT_ID
</statement>

In the above example, the groupBy attribute is set to the column of that specifies the Category ID. All the rows with the same CAT_ID will be considered as a collection for the ProductList property.

Lazy Loading vs. Joins (1:M and M:N):

As with the 1:1 situation described previously, it's important to note that using a join is not always better. This is even more true for collection properties than it was for individual value properties due to the greater amount of data. If you are in a situation where it is rare to access the related object (e.g. the ProductList property of the Category class) then it might actually be faster to avoid the join and the unnecessary loading of the list of products. This is especially true for database designs that involve outer joins or nullable and/or non-indexed columns. In these situations it might be better to use the sub-select solution with the lazy loading. The general rule of thumb is: use the join if you're more likely going to access the associated properties than not. Otherwise, only use it if lazy loading is not an option.

As mentioned earlier, if you're having trouble deciding which way to go, don't worry. No matter which way you go, you can always change it without impacting your PHP code. The two examples above would result in exactly the same object graph and are loaded using the exact same method call. The only consideration is that if you were to enable caching, then the using the separate select (not the join) solution could result in a cached instance being returned. But more often than not, that won't cause a problem (your application should not be dependent on instance level equality i.e. "===").