![]() |
(Microsoft) |
THIS PAGE REPRESENTS ONLY A VERY SHORT EXTRACT FROM THE FULL REVIEW.
TO VIEW THE FULL REVIEW YOU CAN PURCHASE THE REVIEW INDIVIDUALLY OR PURCHASE AN ANNUAL SUBSCRIPTION TO THE OLAP REPORT WHICH ALLOWS ACCESS TO ALL OLAP REPORT CONTENT.
The calculation capabilities in Microsoft OLAP use MDX, the query language from OLE DB for OLAP. MDX is the language used for specifying multidimensional queries, which can include sophisticated calculations. In effect, MDX is the extended multidimensional equivalent of SQL. Although it looks superficially like SQL, it is however quite different, and contrary to one myth, does not simply get translated into SQL internally, even in ROLAP mode.
However, whereas versions of SQL were implemented in the earliest relational databases, and it was therefore initially specified before there was much practical experience of relational processing, MDX benefits from the many years of experience of both multidimensional applications and the lessons learned from setting and evolving SQL standards. Indeed, the people within Microsoft who created the OLE DB for OLAP standard had previously worked on both ODBC and OLE DB, so they are no strangers to the standard setting process. Microsoft included feedback from many outsiders in the MDX specification which evolved considerably in the two years that it was in development before the first release of OLAP Services.
MDX remains the calculation definition language in Analysis Services 2005, but calculations are defined in MDX scripts, rather than just measure by measure. MDX scripts give the appearance of being procedural, but are still executed in a non-procedural way, as in Analysis Services 2000.
The MDX language determines both the tabular (nested) layout of the data returned as well as allowing new members to be defined in all dimensions. It also includes sophisticated conditional calculations and retrievals. One point to note is that MDX concentrates on specifically multidimensional functionality. It does not need to include normal mathematical functions, as other function DLLs can be loaded at run-time by the PTS. For example, all of VB’s Excel’s string and arithmetic functions are available automatically (assuming that these products are loaded), and third-parties can easily add new functions by building ActiveX component libraries using VB or C++. Also, the Excel object model has been extended to accommodate OLE DB for OLAP PivotTables and ADO is accessible from within Excel VBA. This allows third-parties to build sophisticated applications from within Office which directly access the (OLE DB for OLAP) API to Microsoft’s or other OLAP servers that support it. These can be deployed either client/server or on Web browsers using the Office 2000 Web components.
The multidimensional calculation functions (which are usually performed by the PTS) include:
Formula resolution is non-procedural in Analysis Services 2000 and precedence of calculations defined by formulas in different dimensions can be resolved by supplying a solve order parameter to each formula. When multiple formulas intersect, the formula with the lowest solve order is processed first. Simultaneous equations (circular references) are detected but not solved (error values are returned). However, using calculated cells, it is fairly easy to write formulas to solve simultaneous equations. For an example of this see, the CIP review.
In Analysis Services 2005, calculations are still non-procedural, but the solve order is implied by the order that the calculations appear in the MDX script. There is also a new restriction in that one can only create calculated members with session or query scope that have lower solve orders than server-defined calculated members in a few, very limited, circumstances.
Microsoft OLAP allows several types of calculations, including those done in SQL as part of a data load, aggregations that can be done in advance, and on-the-fly calculations that can be either pre-defined or requested in a query. In the Analysis Services release, a much richer set calculations are available than in OLAP Services, and the performance is better through greater caching and better use of SMP hardware.
The fact that different calculations are performed before, during and after aggregation may make certain calculations difficult or slow to implement, although most should still be possible. For example, to calculate allocations sometimes requires costs to be spread based on other calculated factors. If the latter require information that is not available until after a consolidation, then the allocations may have to be calculated entirely at run-time, which may be very inefficient.
The Calculated Cells capability was added to Analysis Services after the beta 2 release, in part because of criticisms made in The OLAP Report. This new capability allows subcubes which can span partitions to be defined, based on fixed or dynamic selections in all dimensions (so that, for example, the subcube can include only a particular member, members at a particular level, descendants of a member or the result of an MDX expression). Within these subcubes, calculations can be further restricted, if required, to only those cells that meet an MDX condition. Typically, such subcubes will include just a single measure or member of the account dimension, of course, but multiple members or sets of members from other dimensions. This is a convenient way of defining calculations that apply only in certain time periods or regions or when certain data conditions apply (for example, when default values need to be calculated if data is not present, or if calculations should supersede invalid or unrealistic data). In the absence of this feature, such selective calculations were very tedious or even impossible to define, hard to maintain and slow to process. Calculated Cells can also be defined within an MDX query, and then apply only to the user and session. Despite the power of calculated cells, they are not yet widely used, partly because they are always done on-the-fly and can get very slow.
Unfortunately, defining calculations is far from easy, partly because MDX is verbose and not at all intuitive. The help facilities are also not very useful, and many people have resorted to buying third-party books. Microsoft clearly intended models to be created by DBA-class users, not the normal business users who are able to develop models in products like TM1.
As can be seen, calculations can happen both in advance and at run-time, and consolidations that were not pre-aggregated are automatically performed at run-time, using the nearest available aggregates. The server uses a row count algorithm to determine which aggregate table to use when deriving run-time aggregations. Most unusually, it allows ROLAP-style sparse aggregate tables to be created and stored in SQL databases, and multidimensional sparse aggregates to be created and stored in the OLAP database. A single cube can contain both types, through the use of multiple partitions. No other OLAP goes this far in allowing the subtle, complex aggregation strategies that are needed to balance good performance while controlling database explosion.
Microsoft OLAP has very strong capabilities to assist DBAs in determining and executing an optimal aggregation strategy. The aggregation wizard analyzes the application and based on goals set by the DBA, it uses a proprietary algorithm to create an aggregation plan. The aggregations are selectively created at intersecting levels of dimension hierarchies, so they are not created for consolidated members of parent-child dimensions, except at the top, as these dimensions do not have intermediate levels.
On the basis that a full aggregation would yield a 100 percent optimized database, the user can choose, for example, an 80 percent optimization. This will typically require less than 30 percent of the space and time required to generate the tables for a fully aggregated schema. Alternatively, the user might want to set a maximum size allowed for aggregates, and the system will determine the estimated optimization possible within this limit. In both cases, the system generates a graph in real-time as it evolves its strategy, and the user can, if desired, simply stop the process when the optimization curve seems to have flattened out.
The system collects detailed usage statistics (stored by default in a Jet database, but they can be stored in any OLE DB for OLAP provider database), and these can subsequently be used by the aggregation optimizer. The collected statistics include date, user, query response time (for Plato to send a response to the PTS) and query frequency. Data is available for usage analysis using tables and graphs, and the database can be cleared once used.
Microsoft recommends that a relatively lightweight aggregation strategy be used for new cubes. The database can then be put into use, and usage statistics collected. These can be used raw or filtered on many different criteria to weight the subsequent optimizations. For example, usage transactions could be selected based on who the users were, the time of day of the transactions, the length of the transactions, the number of queries and so on. Thus, the working system could be optimized to ensure that it works particularly well at peak periods, or that particularly slow queries be improved or that the queries performed by senior management perform best or any combination of these. These usage based aggregations can be added to those originally created, or they can replace them: options unique to Microsoft OLAP.
Note that the aggregation plan can be separately optimized for each partition in a cube. This might be useful in applications that have different data sources or different usage patterns for each partition. However, in other cases, it might be intended to merge partitions in the future: for example, if the latest data is kept in a separate partition, and is then merged into the main partition at period end. In such cases, merging is only possible if both partitions share the same aggregation structure, and the Wizard facilitates this by allowing the aggregations in one partition to be cloned from another.
In principle, it is possible to create aggregate plans manually, or to inspect/adjust the plans produced by the aggregation wizard. There is a utility to do this called ‘Partition Manager’ which was bundled with the SQL2000 Resource Kit, and in an updated version with the BI Accelerator.
This aggregation optimization approach is similar to that in Informix MetaCube 4.0. It appears that Microsoft used a simpler algorithm that yields an optimized aggregation plan more quickly than the Informix approach. However, from the shape of the optimization curves, we suspect that the Informix algorithm produces more optimal results (in other words, the same performance benefit can be obtained from fewer aggregate tables). However, to counter this, the Microsoft approach allows more subtle usage weightings and also allows different optimization strategies for each partition, including whether the aggregates are stored relationally or multidimensionally. It is also easier to use and allows more obvious options for controlling the degree of optimization (the size of the aggregates or the degree of optimization rather than simply how many aggregate tables are created). Overall, despite being perhaps slightly less effective, the Microsoft method is probably preferable because of its simplicity and tunability, but both products stand head and shoulders above the many other OLAPs that claim to be scalable, but provide little or no assistance with deciding which aggregates to build and then maintaining them.
A new aggregation type, distinct count, is performed on-the-fly and is very much faster than the previous approach of using calculated members to achieve the same task (but still too slow in Analysis Services 2000). Combined with a non-empty crossjoin MDX function, this makes basket analysis applications easier and faster. However, only one distinct count can be calculated per cube.
THIS PAGE REPRESENTS ONLY A VERY SHORT EXTRACT FROM THE FULL REVIEW.
TO VIEW THE FULL REVIEW YOU CAN PURCHASE THE REVIEW INDIVIDUALLY OR PURCHASE AN ANNUAL SUBSCRIPTION TO THE OLAP REPORT WHICH ALLOWS ACCESS TO ALL OLAP REPORT CONTENT.