Business Intelligence: Data modeling in a decision-making environment

The typical nature of Data Warehouse projects, compared to traditional management applications, is also reflected in a different data modeling technique. In a decision-making environment, data is organized to directly respond to the typical queries of the end user. To download the full text and for more information, click HERE

Let’s suppose, for example, that the user’s goal is to monitor over time the planning and implementation of investment projects that use public funding, located in various areas of the national territory, linked to different sectors of activity. In a simplified manner, one can imagine representing this activity through a cube, whose sides reproduce the variables time, location, and classification of activity, relative to the projects. Each point within the cube is the intersection of the coordinates defined by the sides of the cube itself and represents the measurement of the fact of interest (the project and its related funding) for that particular combination of location, sector of activity, and time, which represent the dimensions of the analysis. In the Data Warehouse field, this is called dimensional data modeling. The dimensional model and the traditional entity-relationship model are potentially capable of storing the same information. The difference is related to the search for optimal performance in the two types of environments. In operational environments, it is fundamental to adopt normalization techniques that, by eliminating redundancies, allow a high number of repetitive transactions to query and update information at a single point in the database. Decision support systems, on the other hand, are characterized by a reduced number of transactions which operate transversally on the database, requiring read access to a large amount of data to be joined. The use of normalization techniques therefore appears counterproductive in the design of a Data Warehouse system. In a decision-making environment, it is more correct to design the database in a way that directly and simply meets the specific needs of end users, also through denormalized and aggregated data. It is important to pay close attention to the analysis of the conceptual model, foreseeing all the analysis dimensions and for each dimension its related attributes. The attributes of dimensional tables are used directly as search conditions in queries on the Data Warehouse and as column headers in end-user reports. The use of pre-aggregated and summarized data limits the number of operations necessary to reconstruct the requested information. The star schema, besides being optimized for queries, has the further advantage of allowing a more understandable graphical representation of the business for the end user. Unlike the entity-relationship schema, where all entities are represented in the same way, the star schema is asymmetrical. At the center of the diagram is a large dominant table, which is the only one having multiple links, materialized in key fields contained within it, to the other tables. The other tables represent the analysis dimensions through which the “fact” of interest is examined. Technically, the central table is defined as the “fact table” and the others as “dimension tables.” The fact table is the table where numerical measures of the fact are stored (in the example, the amount of funding of the investment project), each representing the intersection of all dimensions and quantifying one or more variables. Normally, it is the only normalized table. The other tables represent the analysis dimensions through which the “fact” is examined and contain descriptive attributes. The cardinality of these tables, usually denormalized, is limited. Among dimensional tables, the time dimension table is always present in dimensional models, regardless of the business to be represented. This derives from the characteristic of Data Warehouse systems that also manage historical information. Sometimes, to further improve the interpretability of the model, dimensional tables are also normalized by making hierarchies explicit. This gives rise to a schema called “snowflake” or Snowflake Schema. Navigating data through the snowflake model is less efficient than in the Star Schema because it increases the number of tables on which to perform joins. It should be considered that, on one hand, denormalization guarantees better performance in query operations, but it also makes data updating and loading more burdensome. More generally, maintenance operations of the multidimensional data model, such as extension to new analysis dimensions, require a “from scratch” loading of the database. For this reason, case by case, one should evaluate the trade-off between the two needs and adopt either the Star Schema or the Snowflake Schema accordingly. It should also be added that, for very complex models with several fact tables sharing numerous dimensions, it becomes difficult to adopt the dimensional technique, also because it is not always clear which tables are fact tables and which are dimension tables. For the conceptual design of the Enterprise Data Warehouse, an appropriately denormalized Entity-Relationship data model appears preferable. Dimensional modeling is instead definitely more suitable for representing Data Mart data. Dimensional models can be implemented both on relational DBMS and on specific multidimensional DBMS. ALL RIGHTS RESERVED

Pubblicato in Business

23 October 2006 Redazione FullPress Business 0

Business Intelligence: Data Modeling in a Decision-Making Environment

Se vuoi rimanere aggiornato su Business Intelligence: Data Modeling in a Decision-Making Environment iscriviti alla nostra newsletter settimanale

Be the first to comment

Leave a Reply Cancel reply