Guidelines: Data Model 
Topics
  
Data Models are used to design the structure 
  of the persistent data stores used by the system.  The Unified Modeling 
  Language (UML) profile for database design provides database designers with 
  a set of modeling elements that can be used to develop the detailed design of 
  tables in the database and model the physical storage layout of the database.  
  The UML database profile also provides constructs for modeling referential integrity 
  (constraints and triggers), as well as stored procedures used to manage access 
  to the database. 
Data Models might be constructed at the enterprise, departmental, or individual 
  application level. Enterprise and departmental level Data Models can be used 
  to provide standard definitions for key business entities (such as customer 
  and employee) that will be used by all applications within a business or a business 
  unit. These types of Data Models can also be used to define which system 
  in the enterprise is the "owner" of the data for a specific business 
  entity and what other systems are users of (subscribers to) the data.  
This guideline describes the model elements of the UML profile for database 
  modeling used to construct a Data Model for a relational database. Because 
  there are numerous existing publications on general database theory, it does 
  not cover this area.  For background information on relational Data Models 
  and Object Models see Concepts: 
  Relational Databases and Object Orientation.  
Note: The data modeling representations contained in this guideline are based 
  on the UML 1.3. At the time that this guideline was developed, the UML 1.4 data-modeling 
  profile was not available. 
As described in [NBG01], there are three 
  general stages in the development of a Data Model: conceptual, logical, and 
  physical.  These stages of data modeling reflect the different levels of 
  detail in the design of the persistent data storage and retrieval mechanisms 
  of the application. A discussion of conceptual data modeling is provided in 
  Concepts: 
  Conceptual Data Modeling.  Summaries of logical and physical 
  data modeling are provided in the next two sections of this guideline.  
In logical data modeling, the database designer 
  is concerned with identifying the key entities and relationships that capture 
  the critical information that the application needs to persist in the database.  
  During the use-case analysis, use-case 
  design, and class design activities, 
  the database designer and the designer 
  must work together to ensure that the evolving designs of the analysis and design 
  classes for the application will adequately support the development of the database.  
  During the class design activity, the 
  database designer and the designer must identify the set of classes in the Design 
  Model that will need to persist data in the database. 
This set of persistent classes in the Design Model provides a Design Model 
  View that, although different from the traditional Logical Data Model, meets 
  many of the same needs. The persistent classes used in the Design Model 
  function in the same manner as the traditional entities in the Logical Data 
  Model. These design classes accurately reflect the data that must be persisted, 
  including all of the data columns (attributes) that must be persisted and key 
  relationships. This makes these design classes an excellent starting point for 
  the physical database design.  
Creating a separate Logical Data Model is an option. However, in the best case 
  it would end up capturing the same information in a different form. In the worst 
  case it would not, and thus in the end might not meet the business needs of 
  the application. In particular, if the database is intended to service a single 
  application, then the application's view of the data might be the best starting 
  point. The database designer creates tables from this set of persistent design 
  classes to form an initial Physical Data Model. 
Still, situations might exist that would require the database designer to create 
  an idealized design of the database that is independent from the application 
  design. In this case, the logical database design is represented in a separate 
  Logical Data Model that is part of the overall Artifact: 
  Data Model. This Logical Data Model depicts the key logical entities 
  and their relationships that are necessary to satisfy the system requirements 
  for persisting data consistent with the overall architecture of the application. The 
  Logical Data Model might be constructed using the modeling elements of the UML 
  profile for database design described in later sections of this guideline. For 
  projects that use this approach, close collaboration between the application 
  designers and the database designers is absolutely critical to the successful 
  development of the database design.  
The Logical Data Model might be refined by applying the standard rules for 
  normalization as defined in Concepts: 
  Normalization prior to evolving the elements of the Logical Data Model to 
  create the physical design of the database.  
The figure below depicts the primary approach of using the Design Model classes 
  as the source of logical database design information for creating an initial 
  Physical Data Model. It also illustrates the alternative approach of using a 
  separate Logical Data Model.    
  
  
Logical Data Modeling Approaches 
Physical data modeling is the final stage of development in the design of the 
  database.  The Physical Data Model consists of the detailed database table 
  designs and their relationships created initially from the persistent design 
  classes and their relationships.  The mechanics of performing the transformation 
  of the Design Model classes to tables is discussed in  
  Guidelines: Forward-Engineering Relational Databases. The Physical 
  Data Model is part of the Data Model; 
  it is not a separate artifact. 
The tables in the Physical Data Model have well-defined columns, as well as 
  keys and indexes as needed. The tables might also have triggers defined as necessary 
  to support the database functionality and referential integrity of the system. 
  In addition to the tables, stored procedures have been created, documented, 
  and associated with the database in which the stored procedure will reside. 
 
The diagram below shows an example of some of the elements of the Physical 
  Data Model.  This example model is a part of the Physical Data Model of 
  a fictional online auction application. It depicts four tables (Auction, Bid, 
  Item, and AuctionCategory), along with one stored procedure (sp_Auction) and 
  its container class (AuctionManagement).  The figure also depicts the columns 
  of each table, the primary key and foreign key constraints, and the indexes 
  defined for the tables.   
  
  
Example (Physical) Data Model Elements 
The Physical Data Model also contains mappings of the tables to physical storage 
  units (tablespaces) in the database. The figure below shows an example 
  of this mapping.  In this example, the tables Auction and OrderStatus are 
  mapped to a tablespace called PRIMARY. The diagram also illustrates modeling 
  the realization of the tables to the database (named PearlCircle in this example). 
 
  
  
Example Data Storage Model Elements 
On projects in which a database already exists, the database designer can reverse-engineer 
  the existing database to populate the Physical Data Model.  See  Guidelines: 
  Reverse-Engineering Relational Databases for more information.  
This section describes the general modeling guidelines for each major element 
  of the Data Model based on the UML profile for database modeling. A brief description 
  of each model element is followed by an example illustration of the UML model 
  element. The Relationships section of this guideline 
  includes a description of the usage of the model elements. 
Standard UML packages are used to group and organize elements of the Data Model.  
  For example, packages might be defined to organize the Data Model into separate 
  Logical and Physical Data Models.  Packages might also be used to identify 
  logically related groups of tables in the Data Model that constitute the major 
  data "subject areas" of importance to the business domain of the application 
  being developed.  The figure below shows an example of two subject area 
  packages (Auction Management and UserAccount Management) used to organize views 
  and tables in the Data Model.  
  
Subject Area Packages Example 
In the UML profile for database modeling, a table is modeled as a class with 
  a stereotype of <<Table>>.  The columns in the table are modeled as attributes 
  with the stereotype of <<column>>. One or more columns might be designated as 
  a primary key to provide for unique row entries in the table. Columns might 
  also be designated as foreign keys.  Primary keys and foreign keys have 
  associated constraints that are modeled as the stereotyped operations of <<Primary 
  Key>> and <<Foreign Key>> respectively.  The figure below depicts the structure 
  of an example table used to manage information about items sold at auction in 
  a fictional  online auction system.   
  
Table Example 
Tables might be related to other tables through the following types of relationships: 
  - identifying (composite aggregation)
 
  - non-identifying (association)
 
 
The Relationships section of this guideline provides 
  examples of how these relationships are used. Information on how these types 
  of relationships can be mapped to Design Model elements appears in Guidelines: 
  Reverse-Engineering Relational Databases. 
A trigger is a procedural function designed to run as a result of some action 
  on the table in which the trigger resides. A trigger is defined to execute when 
  a row in the table is inserted, updated, or deleted. Additionally, a trigger 
  is designated to execute either before or after the table command executes. 
  Triggers are defined as operations in a table. The operations are stereotyped 
  <<Trigger>>.   
  
Trigger Example 
Indexes are used as mechanisms for enabling faster access of information when 
  specific columns are used to search the table.  An index is modeled as 
  an operation in the table with a stereotype of <<index>>.  Indexes might 
  be designated as unique and might be designated as clustered or unclustered. 
  Clustered indexes are used to force the order of the data rows in the table 
  to be aligned with the order of the index values. An example of an index operation 
  (IX_auctioncategory) is shown in the figure below.  
  
  
Index Example 
A view is a virtual table with no independent persistent storage. A view has 
  the characteristics and behaviors of a table and accesses the data in the columns 
  from the table(s) with which the view has defined relationships. Views 
  are used for providing more efficient access to information in one or more tables 
  and also can be used to enforce business rules for restricting access to data 
  in the tables. In the example below, an AuctionView has been defined as a "view" 
  of information in the Auction table shown in the physical data modeling section 
  of this guideline. 
Views are modeled as classes with the stereotype of <<view>>. The attributes 
  of the view class are the columns from the tables referenced by the view. The 
  datatypes of the columns in the view are inherited from the tables with a defined 
  dependency with the view. 
   
  
View Example 
A domain is a mechanism used to create user-defined datatypes that can be applied 
  to columns across multiple tables. A domain is modeled as a class with 
  the stereotype <<Domain>>.  In the example below, a domain has been defined 
  for a "zip + 4" zipcode.   
  
Domain Example 
A stored procedure container is a grouping of stored procedures within the 
  Data Model. A stored procedure container is created as a UML class that is stereotyped 
  <<SP Container>>. Multiple stored procedure containers can be created in a database 
  design. Each stored procedure container must have at least one stored procedure. 
A stored procedure is an independent procedure that typically resides on the 
  database server. Stored procedures are documented as operations that are grouped 
  into classes stereotyped as <<SP Container>>. The operations are stereotyped <<SP>>.  
  The example below shows a single stored procedure operation (SP_Auction) in 
  a container class named AuctionManagement.  When designing stored procedures, 
  the database designer must be cognizant of any naming conventions used by the 
  specific RDBMS.   
  
Stored Procedure Container and Stored Procedure Example 
A tablespace represents the amount of storage space to be allocated to such 
  items as tables, stored procedures and indexes. Tablespaces are linked to a 
  specific database through a dependency relationship. The number of tablespaces 
  and how the individual tables will be mapped to them depends on the complexity 
  of the Data Model. Tables that will be accessed frequently might need to 
  be partitioned into multiple tablespaces. Tables that do not contain large amounts 
  of frequently accessed data might be grouped into a single tablespace. 
A tablespace container is defined for each tablespace. The tablespace container 
  is the physical storage device for the tablespace. Although multiple tablespace 
  containers can exist for a single tablespace, it is recommended that a tablespace 
  container be assigned to only a single tablespace. Tablespace containers are 
  defined as attributes to the tablespace; they are not explicitly modeled. 
  
Tablespace Example 
A schema documents the organization or structure of the database. A schema is 
represented as a package that is stereotyped <<Schema>>. When a schema is defined 
as a package, the tables that make up that package should be contained within 
the schema. A dependency between the database and the schema is created to 
document the relationship between the database and the schema.   
  
Schema  Example 
A database is a collection of data that is organized such that the information 
  in it can be accessed and managed.  The management and access of information 
  in the database is performed through the use of a commercial database management 
  system (DBMS).  A database is represented in the Data Model as a component 
  that is stereotyped <<Database>>. 
  
Database  Example 
The UML profile for database modeling defines the valid relationships between 
  the major elements of the Data Model. The following sections provide examples 
  of the different relationship types.   
Non-Identifying
A non-identifying relationship is a relationship between two tables that 
independently exist within the database. A non-identifying relationship is 
documented by using an association between the tables. The association is 
stereotyped <<Non-Identifying>>.  The example below depicts a non-identifying 
relationship between the Item table and the AuctionCategory table.  
  
Non-Identifying Relationship Example 
Identifying
An identifying relationship is a relationship between two tables in which the 
  child table must coexist with the parent table. An identifying relationship 
  is documented by using a composite aggregation between two tables. The composite 
  aggregation is stereotyped as <<Identifying>>. The figure below is an example 
  of an identifying relationship. This example shows that instances of the child 
  table (CreditCard) must have an associated entry in the parent table (UserAccount). 
 
  
Identifying Relationship Example 
  
For both the association and composite aggregation, 
multiplicity should be defined to document the number of rows in the 
relationship. In the example above, for each row in the UserAccount table, there 
can be 0 or more CreditCard rows in the CreditCard table. For each row in the 
CreditCard table, there is exactly one row in the UserAccount table. 
Multiplicity is also known as cardinality. 
Database Views
When defining a database view's relationship with a table, a dependency relationship 
  is used, drawn from the view to the table. The stereotype of the dependency 
  is <<Derive>>. Typically, the view dependency is named, and the name of the dependency 
  is the same as the name of the table that is defined in the dependency relationship 
  with the database view. 
  
  
View and Table Dependency Relationship Example 
Tablespace
A dependency relationship is used to link a tablespace to a specific database. 
  As shown in the figure below, the relationship is drawn to show that the database 
  has the dependency on the tablespace. Multiple tablespaces can be related 
  to a single database in the model.  
  
Tablespace and Database Dependency Relationship Example 
A dependency relationship is used to document the relationships between tablespaces 
  and the tables within a tablespace.  One or many tables can be related 
  to a single tablespace, and a single table can be related to multiple tablespaces. 
  The example below shows that the table Auction is assigned to a single tablespace 
  named PRIMARY. 
  
Table and Tablespace Dependency Relationship Example 
Realizations
Realizations are used to establish the relationship between a database and 
  the tables that exist within it. A table can be realized by multiple databases 
  in the Data Model.  
  
Table and Database Realization Relationship Example 
Stored Procedures
A dependency relationship is used to document the relationship between the 
  stored procedure container and the tables that the stored procedures within 
  the stored procedure containers act upon. The example below depicts this type 
  of relationship by showing that the stored procedure SP_Auction will be used 
  to access information in the Auction table.  
  
Stored Procedure Container and Table Dependency Relationship 
Example 
In the inception phase, initial data 
  modeling activities might be performed in conjunction with the development of 
  any proof-of-concept prototypes as part of the"Perform architectural synthesis 
  workflow detail" activities. On projects in which a database already 
  exists, the database designer might reverse-engineer the existing database to 
  develop an initial Physical Data Model based on the structure of the existing 
  database. See Guidelines: Reverse-Engineering Relational 
  Databases for more information. Elements of the Physical Data Model might 
  be transformed into Design Model elements as needed to support any proof-of-concept 
  prototyping activities. 
The goal of the elaboration phase is 
  to eliminate technical risk and to produce a stable (baselined) architecture 
  for the system. In large-scale systems, poor performance resulting from a badly 
  designed Data Model is a major architectural concern. As a result, both data 
  modeling and the development of an architectural prototype that allows the performance 
  of the database to be evaluated are essential to achieving a stable architecture. As 
  the architecturally significant use cases are detailed and analyzed in each 
  iteration, Data Model elements are defined based on the development of the persistent 
  class designs from the use cases. As the class designs stabilize, the database 
  designer might periodically transform the class designs into tables in the Data 
  Model and define the appropriate data storage model elements. 
By the end of the elaboration phase, the major database structures (tables, 
  indexes, and primary and foreign key columns) must be put in place to support 
  the execution of the defined architecturally significant scenarios for the application. 
  In addition, representative data volumes must be loaded into the database to 
  support architectural performance testing. Based on the results of performance 
  testing, the Data Model might need to be adjusted with optimization techniques, 
  including but not limited to de-normalizing, optimizing physical storage attributes 
  or distribution, and indexing. 
Major restructuring of the Data Model must not occur during the construction 
  phase. Additional tables and data storage elements might be defined during 
  the construction phase iterations based on the detailed design of the set of 
  use cases and approved change requests allocated to the iteration. A primary 
  focus of database design during the construction phase is to continually monitor 
  the performance of the database and optimize the database design as needed through 
  de-normalizing, defining indexes, creating database views, and other optimization 
  techniques.  
The Physical Data Model is the design artifact that the database designer maintains 
  during the construction phase. It can be maintained by either making direct 
  updates in the model or as a result of a tool reading updates that have been 
  made directly on the database. 
The Data Model, like the Design Model, is maintained during the transition 
  phase in response to approved change requests. The database designer must 
  keep the Data Model synchronized with the database as the application goes through 
  final acceptance test and is deployed into production.  
If a development team is using modern visual modeling tools that have the ability 
  to convert classes to tables (and vice versa) and/or has the ability to reverse 
  and forward engineer databases, then the team needs to establish guidelines 
  for managing the transformation and engineering processes. The guidelines 
  are primarily needed for large projects in which a team is working in parallel 
  on the database and application design. The development team must define 
  the points in the development of the application (build/release cycle) at which 
  it will be appropriate to perform the class-to-table transformations and to 
  forward-engineer the database. Once the initial database is created, the 
  development team must define guidelines for the team to manage the synchronization 
  of the Data Model and database as the design and code of the system evolve throughout 
  the project. 
  
 
 
 |