This is Chapter 1 from the O'Reilly book "Oracle design" by Dave Ensor & Ian Stevenson. You can purchase the book here.
This book is about database design--in particular, the design of Oracle databases--and some elements of the code that accesses them. How much does design matter? A lot. Performance usually makes or breaks a computer system, and design is central to good performance. If your database isn't well-designed from the start, it will be almost impossible for your applications to run efficiently. Even the most powerful hardware and software, and the most highly tuned code, won't make up for a poor or incomplete design. Design will also improve the chances that the system meets the original requirements and will ensure that what you intend to build is achievable, given the constraints that will inevitably be imposed upon you. Finally, good design will make it much easier for you to maintain your application, to change the way things work, and to add new features.
What is design and when do we perform it? Design encompasses three major areas:
- The design of the specific database objects that will be implemented in a database. For Oracle, these include such objects as tables, views, indexes, and stored procedures, functions, and packages.
- The design of the specific screens, reports, and programs that will maintain the data in the database and allow inquiries against that data.
- Under certain circumstances, the design must also be concerned with the specific environment or technology--for example, the network topology, the hardware configuration, and the use of a client/server, parallel processing, or distributed database architecture.
In an ideal world, you will be able to construct a database design that is utterly clear and logical and that achieves the best possible performance for your system. But design, like every other part of life, is all about tradeoffs and informed decision-making. Here is a real-world definition of design:
Design is the business of finding a way to meet the functional requirements within the specified constraints using the available technology.
What are these constraints? Every project has some absolute requirements--usually, the maximum amount of elapsed time that can be consumed by the project, and the maximum amount of money that can be spent on it. A project may also have a myriad of other inconvenient requirements and limitations. As a designer, your job is to do the best job you possibly can within this structure--and to point out where tradeoffs can be made and where they will seriously jeopardize the success or quality of the project.
Although design is a vitally important task, it hasn't always received the attention it deserves. One problem is that the design of a database isn't as structured a task as the analysis of a project's requirements or the building of the application. You'll find many books on project analysis and development, but very few on design. (We're out to remedy that!)
There is one very common misconception about design--that it is a single and discrete phase of a project, falling neatly between the analysis and build phases. Very often, in books about project development, you'll see a diagram such as the one in Figure 1-1 of the so-called "waterfall method."
The reality is that a project simply can't be pieced together in such an idealistic way. For one thing, there is a wide overlap between the phases of a project. For another, design doesn't begin and end in such a clear way; it often continues through the test and implementation phases of the project.
In the next section, we'll present a case study of an actual (but much simplified) project. We'll look briefly at each phase of the project (strategy, analysis, design, build), and introduce some important terms and concepts along the way. We'll examine these in a lot more detail in later chapters.
In this example, let's look at a car rental company that wants to implement a new pricing strategy using discounts and loyalty bonuses. The mainframe system is already bursting at the seams and can't be further enhanced to provide the new functions. A new project to replace the aging technology with an Oracle database and a graphical user interface (GUI) front-end seems like a good idea! The company appoints a project manager, and we're on our way.
The first thing the new project manager does is to commission a strategy study. The main purposes of this study are to determine the true scope and goals of the project, and to produce high-level entity and function definitions. To be successful, strategy studies require highly experienced business analysts with frequent access to senior management. It is senior management who will have to approve the funds for the later stages of the project, and it is vital that their expectations of the results are carefully recorded.
The strategy phase involves heavy interaction with key users and business experts, extracting information from them and presenting it back to them in a formalized way to ensure a thorough and unambiguous understanding of the requirements. This may be done in a number of different ways--for example, as a series of one-to-one interviews and/or as general workshops. In our example, the analysts speak to the CEO, the Marketing VP, the Customer Services Manager, right down to the clerk on the hire desk. They learn about the business need to introduce special discounts for frequent customers and the introduction of a fourth category of rental car (minibus). They also hear about flexible payment schemes and special offers. In short, they learn what the business is all about and where it should be going if its computer systems would only let it.
Once the main part of the strategy study is complete, a technical architect (or a team, though it usually only takes one person) can set out the likely technical approach and the approximate hardware, software, and development costs. With this additional information, the project manager and his team can go back to the project sponsors with the deliverable from the strategy phase.
This is a clear statement of what the sponsors will get if they agree to proceed (scope), when they will get it (timescale), and how much it will cost them. The presentation should quantify the benefits as well as the costs ("this project will pay for itself in 18 months"), and if this is not done, or worse is not possible, then it may well be that there are no tangible benefits and the project should not proceed. However, such projects are often approved.
In our car rental scenario, the senior management endorse the findings and says that they are willing to wait one year and spend $1.2 million to achieve the requirement. The report covers (among other items):
- Constraints, risks, and critical success factors. Service targets such as response times and system availability, once validated as being genuine requirements and not simply nice to have, are constraints. In the case of the rental company, the timescale to introducing new pricing plans is a critical success factor as any significant slippage in schedule may mean a loss of market share.
- The required functions, and possible future requirements, such as allowing rental cars to be booked via the World Wide Web.
- The entities required to support those functions.
This is also the time and place to state clearly what elements are not going to be included in the scope of the project. For example, in our car rental example, we state that management reports against our data are not in the scope of the project and will have to be dealt with as a separate project.
What is the outcome of this sample strategy phase? The conclusion is that we are going to go ahead with the project. The cost of not going ahead with the project is that the organization will not survive since all the competitors are introducing more flexible pricing schemes and thus are becoming more competitive.
Now that we have a clearly defined scope and a budget to play with, we move into the analysis phase.
The analysis phase is a detailed investigation of the required business processes (functions) and the information required to execute those functions (entities with their attributes and relationships). We'll give a quick overview of this phase in this section, and will provide more detail in Chapter 3, Data Modeling.
Even during the analysis phase it is important to articulate the requirements in general terms. By "general" we don't mean vague. What we mean is that no assumptions should be made about the computer system. This isn't easy to do. If (like us) you are a designer or programmer at heart, then you will find it extremely hard not to think in terms of "this screen" or "that table." Many programmers find it deeply mysterious that the most successful analysts are incapable of writing code (even though they usually claim that they could write some if they had to). The reason is simple: their inability to cut code means they don't waste time and effort in analysis trying to work out how to design the system.
Analysts capture and document two distinct but related forms of information:
- Functions--information about events and processes that occur in the business
- Entities--information about "things of significance to the organization about which something is known"
Two of the classical outputs from analysis are:
- The function hierarchy, which breaks down the processing into functions, or things that are done
- The entity relationship model, which captures all the entities, their attributes and the relationships between them
Figure 1-2 shows part of the functional decomposition and illustrates how we break down the billing process into a series of smaller steps until they cannot be sensibly split any further (at this point they are considered atomic). Each of the headings shown in this figure would normally be accompanied by a short description of the function. The atomic functions will ordinarily have a more detailed description of their purpose.
Figure 1-3 shows a simple entity model for our car rental system, in the form of an entity relationship diagram (ERD). Pictures such as this one help enormously with understanding and documenting the system. Modeling techniques and the conventions behind this diagram are the main subject of Chapter 3. In a complex system, the ERD of the entire model will almost certainly be about as easy to read as a single-page map showing every star visible from the Northern Hemisphere! We recommend that you break it down into a number of small diagrams, each one depicting a relatively self-contained subject area of the model.
WARNING: We keep seeing serious omissions in design and code which result from designers and programmers using diagrams that do not tell the whole story. So if you are going to use incomplete or partial models, please make sure that you show the off-page connectors and that you provide some idea of where they go.
Each entity should have a description of what it is, a set of attributes with properties and descriptions, and a unique identifier that tells us how to distinguish one occurrence from the others.
WARNING: Time, however, for another warning. It is dangerous to show entity relationships without naming them since readers then have to assume the nature of the relationship and can get it wrong. For simplicity, we have only named two of the relationships in Figure 1-3, those which relate rental car to site. In a different rental company it might also be necessary to show other relationships between these two entities, such as originally delivered to and nominally based at.
During analysis we also need to capture any business rules. These aren't functions as such; rather, they are immutable facts about the operation that must be obeyed at all times by the system. In our example, some sample business rules might be:
- The special discount rates may not be applied on vehicles less than one year old.
- The total discount (loyalty + volume) may not exceed 40% of the net invoice amount.
Other outputs in frequent use include entity life histories (ELHs) and data flow diagrams (DFDs), both of which introduce the concept of sequence, which is impossible to impose on either a function hierarchy or an entity relationship model. Indeed, trying to show life history on an entity model is a common cause of error. Where ELHs are not being used, business rules may be used to supply the missing information--for example:
- Only a vehicle that has been checked and certified by a service attendant since last being driven on the public highway can be allocated to a rental contract. (This definition carefully includes vehicles that have been used or transferred between locations by company employees or contractors.)
As a final step in the analysis phase, the team draws up a plan for system testing. Figure 1-4 shows an entity life history diagram for a rental car.
The designer (or design team) takes the output of the analysis and produces:
- A database definition (or schema)--based on the entity model developed during the analysis phase
- A set of module specifications--built from the function models
This section gives a quick overview of the design phase. The later sections called "Planning the Design Phase" and "Design Tasks" provide more detail about what goes on during this phase.
NOTE: On the subject of the handover of deliverables note that in a very small project, it is possible that the analysts, designers, and builders are the same people (or person). If so, then you should aim to be disciplined about the transition and hand over the deliverable to yourself (wearing a different hat). Treat the process as if you were handing over the deliverables to someone else who has little knowledge of the system.
The database schema will contain all the database objects: tables, views, columns, indexes, clusters, constraints, triggers, etc. The design will not only produce the definition of the objects, but also the physical objects themselves that the developers will build against.
The specifications must be detailed enough that a developer can take them and turn them into a working program that meets the requirements. Much as we may all hate to think this far in advance, we must also turn our thoughts to testing and implementation. This means that we must deliver plans that ensure a thorough test of the application and a smooth transition into the production environment.
Many projects package most of their design deliverables into a single definitive document, called something akin to a technical specification. This document also outlines the approach taken to any complex technical issues.
The designers need to ensure that the database schema and the modules are mutually consistent in addition to being consistent with the analysis. For instance, the RENTAL_CARS table that is derived from the RENTAL CAR entity must have a module somewhere that allows the insertion of new cars. One of the key elements of design is to ensure that the system will perform well when built, given whatever constraints are imposed by the target system architecture. For this reason, the designers need to scrutinize the architecture itself. For example, the required throughput on the database server may lead us to consider using some form of parallel processing. Or we may look at alternative approaches to dividing the processing between the client and the server. We look at these architectures in detail in Chapter 14, Designing for Parallel Processing, and Chapter 11, Designing for Client/Server.
As we've said, performance is critical to any system. To get the best possible performance out of this system, we may need to make some compromises and tradeoffs. To our Car Rental Corporation, the most critical area is the ability to find the lowest price for a contract quickly and accurately. We may have several thousand possible combinations of price plan and discount plan for a single hire; the SQL to find the best combination is starting to look both tricky and potentially expensive in terms of CPU time. So we decide that, although we are going to hold all the price and discount plans in the database for maintenance and reporting purposes, for pricing we will load them into memory once at start of day (into an application-managed cache). We also decide that the pricing algorithm will be completely written in C against the cache (we won't use SQL).
During the design process, we seek out common or similar functionality in the functional definitions and design it as a single module. We decide if it makes sense to implement common modules and modules which enforce our business rules as database-resident code (triggers and stored functions, procedures, and packages). Where the processing is highly database-intensive, this makes good sense. This part of the code is best produced and tested during design and put in place before the rest of the modules are built. In our case, we create a trigger on the RENTAL_CARS table to prevent the creation of an agreement with a special billing rate on a car that was registered less than a year ago. This implements one of the business rules we mentioned earlier.
In our case study, we are replacing a legacy system and we therefore have to deal with the issue of data migration. Much as we would like to start with a blank database, this is not an option; the existing system has lots of useful data, both current and historic, that we need to retain. We decide that the most sensible option is for the old system to produce a set of flat data files that are brought across to the new system and loaded in. We draw up specifications of these file formats in agreement with the system programmers on the mainframe who will have to produce them. Chapter 8, Loading and Unloading Data, has much more to say about data migration.
There are many deliverables from design; these will vary from project to project. However, every Oracle project will have to deliver a database schema (or schemas if the data is to be distributed) and almost all will deliver some formal module specifications. Samples are illustrated in Table 1-1 and Figure 1-5. The deliverables to and from design are summarized in Figure 1-6.
Volume Discount Calculation (B23478)
Command Line : B23478 [Invoice Number]
Compute the total gross invoice amount for the customer over the last six months. If it is a parent company, then include all subsidiaries.
Build and Beyond
After the initial design is complete, we begin the build phase. During this phase, the program modules specified during design are coded and tested. The major deliverables from the build phase are the source and object modules and the unit tests. After the build phase, the project moves into system test and finally into implementation when the new system is commissioned. We won't discuss these phases in any detail here because they are beyond the scope of our emphasis on design. It is, however, important to realize that we should expect issues to be referred back to design (and to analysis) right up to production. If the testing is anything other than perfect (and it rarely is anything like perfect), then we must also expect design-related issues to crop up frequently during early production running.
Successful Project Completion?
It is a sad fact of life that not every project gets to the implementation stage. One especially common cause of failure, simple failure to provide what was expected, is discussed in the next section. Figures vary depending on who you ask, but some believe that as many as 40% of projects are either abandoned completely or are severely reduced in scope along the way. An essential facet of design that we have mentioned only briefly above is risk identification and management. Every project has risks; as designers, we must focus in on technical risks and confront them as early as possible. Once we have identified the risks, we need to articulate them, along with a plan to eradicate or minimize them.
Keeping a Design Log
It is important that you record all of the issues that crop up during your design of a system. You need to record (somewhere...it doesn't matter where) all options that were discussed (even casually), options considered, and final decisions made. You may hold this information in a CASE repository, in a series of well-indexed word processing documents or text files, or even on paper. The important thing is to make absolutely certain that you keep it somewhere.
Why are we so emphatic? Because both of us have been burnt in the past when our project teams changed design decisions--in part, because nobody on the project could justify the current state and because all of the arguments were long forgotten. In more than one case, the reason for the original design decision became all too clear a few months later when a serious problem was found with our new improved approach.
The design log also makes excellent background reading for new project members when they join the team; whether they are designers, developers, testers or management, the log will give them an insight into the technical background of the project.
Here is a small extract from the design log for the car rental system:Design LogSection: Data DistributionInitial Thoughts (02/07/95):There is a general feeling that the corporation should move away from the heavy reliance on a central mainframe and go distributed. The main reasons are performance and resilience. These could be addressed by upgrading the mainframe hardware and providing a hot standby machine. The introduction of distributed architecture will not completely remove the reliance on the central server, and the design team remains unconvinced by the theory that in the event of failure of the central host, one of the other servers could temporarily assume its role. We are also concerned that the regional offices do not have the necessary skilled staff to manage a server so all maintenance and administration will have to be done remotely.
Meeting with Operations (09/07/95):The operations staff are happy that remote maintenance will not be a problem. However, they are concerned about the effect of a distributed database on the lines between offices; they may have to provide more bandwidth. A benchmark is to be set up ASAP.
Meeting with Oracle (09/27/95):We discussed all the replication options that Oracle can supply and determined that we will predominantly be using read-only snapshot. Oracle has arranged a visit to a reference site so we can see it in operation.
Returning to our case study, we note that the accounts system with which we have to integrate is an aging system written by a third party who has unfortunately gone out of business. There is no documentation for the interface and all we have to go on is what we can observe from the system we are replacing. We document this risk and consider certain alternative strategies:
- Buy a new accounts package. Unfortunately other in-house systems use it as well, so this would be a major project in itself.
- Take the accounts interface modules directly from the old system and use them as much as possible.
- Develop a layer of software ("middleware") that lies between the application and the accounts package. The application software make simple and well-defined calls to this generic software layer, such as "PostDebit <account> <amount>." The middleware turns this into a call to the current accounting system. The middleware can be developed in isolation of the main application and, when a new accounts package is acquired, we only have to rewrite the middle layer; the application programs are nicely isolated from the internals of the accounting software.
We decide that option 3 is the best alternative and start designing the middleware approach.
What we've presented in the preceding sections is a rather traditional approach to database design. That is, even though we've previously said that design usually isn't done all at once, in fact we have shown it happening that way in the car rental example. But when does analysis actually end and design actually begin ? Likewise, when does design end and build begin? In most projects, these transitions are not "all or nothing" issues. As parts of the analysis become stable, the designers are already at work familiarizing themselves with the analysis and starting the early design tasks. The periods of overlap are often quite significant. Similarly, as parts of the design stabilize, the programmers can start building modules; this makes particular sense when there are common modules that must be widely called within the application.
During design, there is often an end user and management perception that nothing is being achieved because there are no finished applications components for them to run. Although education may help, this reaction may have serious implications for an overall project. Let's take a brief look at a few alternative methods that may help to allay this perception.
What About RAD/JAD?
One reason that so many projects fail is that they simply don't deliver what the users want or what the business requires. Often what the users want is not exactly what the business requires, and the project guarantees failure by delivering neither. Somewhere along the path through the analysis, design, and build phases, communication has broken down and the requirements have been lost or misinterpreted.
There may be pressure from project sponsors or budget holders to build early to get a subset of the project built and able to be demonstrated as being "on the shelf" as soon as possible. Often when analysis and design are skimped it is mistakenly done so in the name of rapid application development (RAD) or joint application development (JAD) methods. We are skeptical about these methods. Would you start building a bridge or manufacturing a car before you had done a full design? Probably not. So why are such methods deemed acceptable in an information technology (IT) development? The answer is that they should not be.
With RAD and JAD methods, the designer/builders of the system sit down with users to develop the system in an iterative process. The designer develops a working prototype and demonstrates it to the users in a workshop session. The users provide input about what they like and what they don't like. The designer takes the input under consideration; he or she goes away and enhances the prototype for a second workshop. The process may go through several iterations until finally the users like what they see and the prototype "becomes" the live application. Usually, there is a time limit or a restricted number of iterations built into the process; otherwise, the users could go on enhancing it forever!
In theory, the RAD/JAD approach should deliver to users exactly the system they want to see. However, there are some extreme dangers in this approach:
- Everyone becomes very focused on the content of screens because they are so visible. Unfortunately, the processing behind the screens and the database or schema design are often neglected.
- There is often a misconception that once the final prototype is agreed upon, the module is complete. The truth is often quite different: the module has not necessarily had to interact with other modules or even the database; it may be nothing more than a pretty screen with a few dummy routines behind it.
- Designing modules in complete isolation of each other leads to inconsistencies and conflicts when they eventually have to be put together for a system test.
- As functionality is evolving on several different fronts simultaneously, someone needs to keep a firm hand on the database structure that is supposed to support the functionality. RAD can become a free for all where tables are "knocked up" and columns are "slapped on" with no regard to the overall effect. The consequence will often be a database that holds redundant or inconsistent data and that performs very badly.
- It is easy to forget to produce documentation when developing in RAD.
RAD or JAD can work, but in our opinion should only be used in cases where all of the following criteria apply:
- The scope of the project and the requirements in business terms are very well defined.
- The project is relatively small and self-contained. By this, we mean that it does not have a lot of external interfaces to deal with.
- The system is very screen-oriented and the usability of the screens is among the top five critical success factors of the project.
- The users are already computer literate, have an understanding of IT development, and are very positive about the idea of the new system.
Where these conditions do not all apply, it may still be possible to use a RAD approach, but to restrict it to a subset of the application.
Whatever methodology you employ on a project, it is always good to have some kind of contingency to fall back on when you are up against the wire. As part of the analysis and design phases, you should always grade the intended functions of the system by importance. One format we like to use is MoSCoW, a term borrowed from Clegg and Barker. This acronym is derived as follows:
The final category is perhaps the most important: It is essential that we be honest and up front about what we are not going to deliver.
The items in must have are the critical deliverables and success factors of the system (must process rentals, must respond to price inquiry in under 5 seconds in 95% of cases, must run 20 hours a day, and so on).
The middle categories (should have and could have) fit into a timeboxed or cashboxed structure--in other words, the life of the project is restricted by timescales or budget. According to this scheme, we will develop the must have items and as many of the should have and could have items as we can, in order of priority, until we run out of either money (cashboxed) or time (timeboxed).
Let's apply Moscow to our hypothetical car rental example. The sample list is deliberately controversial.
- Must have:
Price inquiries take less than 5 seconds in 95% of cases
Seamless integration with our third-party accounting software
Support for a fourth category of rental car
Support for loyalty and volume discounting
Screens to support all data entry
Take-on of all reference and transactional data from legacy system
Complete audit trail of all transactions
- Should have:
Support for varying levels of user and security access
Support for dial-in home-based users
Ability to introduce new discounting schemes in the future
Support for ad hoc invoice runs for a single customer
Invoices customized to customer requirements
Support for archive of old data
Ad hoc reporting capabilities
Earlier in this chapter we provided a quick summary of what goes on during the strategy, analysis, design, and build phases. In this section we provide additional detail about the design phase.
Careful planning is essential to any project or project stage. The planning of the design phase is usually the joint responsibility of the project management and the chief or lead designer (if you have one). The benefits of planning should be obvious to everyone, but we'll examine them anyway. Planning:
- Breaks down a momentous task into small, self-contained, manageable and (above all) achievable steps.
- Provides short- and medium-term goals and milestones, which can act as a yardstick for measuring "actuals" against the plan. This provides early warning of any likely slippage in the plan.
- Identifies the dependencies between tasks (i.e., which tasks must be complete before other tasks can start).
- Determines the critical path, those resources (usually an individual person or a team) on whom the plan depends most heavily.
- Allows management to forecast staffing requirements over the life of the project.
- Gives project management a good indication of when the build stage can start.
The last two points are inextricably linked. If you add staff during the design stage, you may possibly be able to move the build date forward. In practice, you can rarely juggle these two parameters since at least one is a "given." For instance, the budget may determine the staffing or, alternatively, the delivery date for the project might dictate the latest date that the build stage must commence. If budget/staffing and delivery are both constrained, then the planning at this stage is essential to ensure that the project can be achieved within the given constraints.
Although project sponsors always want plans to be set in stone, experience tells us that such plans simply will not work. As design progresses, ideas and techniques mature and we learn more about the system. Periodic review and re-planning sessions are essential to any planning process. Working to an out-of-date and now unachievable plan is much worse than working to no plan at all. At least when there is no plan, everybody knows the position!
In the hypothetical car rental system, the project manager has an outline plan for a year's project. The design phase is scheduled to take three months using three full-time designers. One designer will concentrate primarily on the database; the second will assume responsibility for screens, reports, and processes; the third will tackle interfaces. The plan currently shows that the person responsible for code (screens, reports, and processes) is on the critical path and may share the work on processes with the interface designer because there is a strong synergy between the two.
Two Phases of Design
In the more traditional models of project development, design is sometimes broken down into two phases high-level design and low-level or detailed design. This is more relevant for a large project, where an intermediate stage breaks up what would otherwise be an onerous task. High-level design is geared toward solving generic issues and providing a framework and may be replaced by the technical architecture component of a strategy study. Detailed design is where the first physical database cuts are made and individual module specifications are written. In this book, we generally treat design as a single phase for the purpose of semantics. The techniques outlined in this book will apply equally whatever the underlying methodology.
Splitting the work in this way is one sensible staffing option. Another is to section it up by (business) functional areas. Both approaches have their pros and cons. However the work is divided, it is essential that strong communication be maintained between the designers. To this end, our project manager arranges short weekly design reviews where any issues can be brainstormed and resolved.
Let's now draw up a draft list of design tasks that will collectively produce the deliverables that we have identified for that mythical animal, the "typical Oracle-based project." In this section, we present them in roughly chronological order, although it is impossible to specify a strict order since not all tasks assume that others are complete. Some of the material in the section is expanded later in the book, but we feel it is useful to present it here to provide the "big picture" of design that can be used as a checklist.
Obviously, the time spent on each task will vary from project to project, and some tasks may not be relevant at all on very small-scale projects. Most of these tasks will appear as separate items in the project plan and will warrant at least a small section in the technical specification (which should set out the decisions that you have made). Again, we'll refer to our sample car rental application for meaningful examples.
Design: The Early Days
There are certain decisions and tasks that we must confront at the very start of the design phase. Doing so is very important because deferring them could impact some of the design that has already been completed. These tasks are largely about agreeing upon a strategy and an approach to design, as well as making sure that the necessary infrastructure is in place.
Review and accept the analysis deliverables
This is a hand-over process. Here, we make sure that the analysis is complete and in a fit state from which to design. In practice, this is often an iterative process with the designers questioning the analysis as they begin to understand the requirements. The key check to make here is that the analysis output tells the design team what the application is supposed to achieve and not how to achieve it (otherwise they are stepping on our toes as designers!).
As designers, it is impossible for us to verify that the analysts have covered the entire business areas for the system. We just have to assume that this is so. What we can check are some common-sense concepts and axioms. Here are some examples:
- Does every entity have a unique identifier and at least one non-key attribute?
- Does every entity have at least one function that creates occurrences of it, and at least one function that references it?
- Does every attribute have both a creator and a reader?
If the analysis is held in a CASE repository, there are usually automated checks of this nature that we can run.
In our sample application, we have captured the analysis in Designer/2000. We run some of the completion reports and find that there is no function to create new instances of the RENTAL CAR entity. We notify the analyst, and he creates a new function for us. We also note that some of the functions come with screen designs already sketched out. We do not reject these, since there might well be some useful information captured on them, but we make it clear that we can't guarantee that the screens we'll ultimately design will bear much resemblance to the originals.
Walk through and test designers' understanding of analysis
To put it very bluntly--if you jump in and start designing before you fully understand the requirements, then you will fail, and you will fail badly.
Reading through reams of paperwork or paging through screens of definitions in a CASE product can be a boring and unproductive way to understand the requirements of the system. There is a limit to the amount of information that can be absorbed in this way. Walkthroughs conducted by the analysts are a much better way of getting the designers up to speed with the requirements. It is much more interesting to hear someone present an overview of some area of the business and describe orally how the analysis represents the requirement. These walkthroughs should describe one area of the system, should not be more than an hour's duration, and should be spread out with at least a day between each one. Walkthroughs are a very effective means of gaining a rapid understanding of what the system is (or will be) all about, and they often provoke very informative discussions.
The early days of design are invariably a hard slog. Reading documentation, even broken up with some walkthroughs, will still leave us unsure about many aspects of the required functionality. As designers, we will tend to prematurely start to form pictures of how the system will all be put together. This can lead to misconceptions that are hard to put right.
We need to stop, take a step back, and test our understanding. One good way of doing this is to take a small number of scenarios that we believe are common to the business and to run them through our current understanding of the system. This can take the form of a simple data flow diagram (DFD), flow chart, or a basic writeup of the steps. We should indicate manual procedures and working practices as well as system functions. Take at least one common scenario from conception to completion.
Let's look again at the car rental scenario. We hold a series of mini-workshops with our analysts on the complex discounting process that is a part of the system. We are led through several examples of varying complexity. We test these scenarios against our entity model and function definitions.
Here is a simple scenario that we use to test our understanding:
- Customer comes to the desk and asks if we have any stick shift cars for rent.
- The clerk uses CRTS0310--Query cars in stock, using a simple query by example to specify "stick shift." The query is against the RENTAL CAR entity, restricted to our SITE.
- The search reveals that there are none currently at this site. Customer indicates a willingness to wait up to 30 minutes if we can get a stick-shift car delivered.
- The search is widened by selecting a nearby site in CRTS0310 and rerunning the query. This has to be run for a single site in turn; there is no "proximity" searching.
- The search identifies a stick-shift car at a nearby site and the customer decides to take it.
- A car may not be reserved unless it is either at the site of origin of the reservation or in transit to that site. The clerk presses the "Request Transit" button on the screen. This removes the "currently at" link between the RENTAL CAR and the old SITE and replaces it with an "in transit" link to our site. An alert (in the form of an urgent email) is sent to the dispatch office at the other site so they know to send the car.
- Having reserved the car and requested the transit, the clerk can now process the rental and payment details.
- This is an existing customer so the clerk checks the driver's license details against those held in the system, using CTRS0430; this also verifies that this customer is entitled to a loyalty discount.
Identify critical areas
What are the critical success factors (CSFs) for the project. For example, is good performance essential? (And if so, is it defined in functional terms such as the length of time that it takes a user to create an order, rather than as the response time at the workstation?)
Let's look particularly at what we mean by "critical." It is quite common for the critical areas to be covered in the initial strategy study, but the emphasis tends to be lost in the detail of the analysis. The term "critical" can mean either vital to the acceptance and success of the project or critical in business or functional terms. Areas of the system that have been identified as critical are good candidates for prototyping. Prototypes can help the designer gain a better understanding; they may also help prove that the demands of the critical area can be met. It is often helpful to prototype more than one approach to the problem, and compare the outcomes.
It is fairly common for new areas to emerge as critical during the design phase. Some functions or areas of the model that are easy to specify in logical terms during analysis can become frighteningly difficult to support when they're viewed from a more physical perspective during the design.
There is a strong affinity between a project's CSFs and its risks. Obviously, if we fail to meet a CSF, then we run the risk that the project will be a failure. We also need to identify other risks which aren't necessarily directly connected to CSFs. Risks can be anything from pushing the leading edge of technology (developing on beta software, for example) to staffing and human risks (such as the risk of not being able to recruit a sufficient number of people with the required skills).
One of the CSFs of the car rental system is that our users embrace the GUI technology that will be new to many of them. We plan to minimize the risk of acceptance by holding a series of workshops at which we demonstrate screens with a series of GUI controls (drop down list boxes, radio buttons, tabbed dialogue boxes, etc.) and a series of color schemes and navigational methods (icons, function keys, menus, etc.) We will ask users for their opinions and preferences and will aim to get the users to sign off on a document of common look- and-feel standards that we'll use throughout development.
Another area of risk is the short(ish) time window for the billing/invoice run. We tackle this by developing the run early in the project to allow plenty of time for testing, optimization, and tuning. We know that online application response time is crucial--usually a customer is waiting, and quite often there is a queue. We agree with the users that a subsecond response is required on 80% of all screen transactions. Throughput is also an issue since the invoice production run has to complete in a 6-hour window, and it may be processing up to 500,000 billing items per run. We decide that using Oracle's parallel option in conjunction with a symmetric multiprocessing system is appropriate. We will also evaluate the feasibility of a distributed database covering our remote locations.
Evaluate system constraints
All systems being developed are constrained in some way--there is no such thing as a bottomless pit! We have already mentioned that the budget and/or implementation date may be fixed. There may be further business constraints, such as the need to check security clearances on staff before they are allowed to start work on a sensitive military project. Other decisions that may have been made (and are irreversible) include hardware platform, software choice, and compatibility with legacy systems, to name but a few. If the project is not completely new, but is an add-on to an existing system, then the number of inherited constraints is potentially quite large. You must check the feasibility of analysis requirements in the light of all of these constraints to see if you are about to attempt Mission Impossible!
In the car rental system, we are constrained by tight timescales for the project. We also have constraints on the response times of the screens and on the throughput that must be achieved by the invoice and billing run. We consider that all of these are achievable and will address them individually during the design.
Determine the target architecture
What do we mean by architecture? Of course, we have to decide on the hardware platform or platforms. But there is something even more fundamental: note that we said "platforms." More than one computer may be involved; in fact, the chances are good this will be the case. We have some additional questions to ask:
- Is the system to be client/server?
- Is the database to be distributed or replicated and, if so, is every participating database Oracle or is it a heterogeneous network?
- Might we need the Oracle Parallel Server to handle the required load or throughput?
We often find that these decisions have been made prior to the design stage. If so, during design it pays to review the basis on which they were made. Quite often there are implications that have not been anticipated. For instance, suppose that you are brought in to design a user-friendly GUI client/server system. Then you learn that access is required to the system by another group of users who have only VT100 character-mode terminals. Clearly, you have to consider your options. You may need to come up with a compromise between, on the one hand, having common applications and ignoring some of the GUI and client/server features and, on the other hand, developing two sets of application software.
If a network is involved in the target environment, you'll have to incorporate some additional tasks into the plan. You will need to define the required service levels of the network and to review or design the network topology. This is the time to formulate contingency procedures and make plans in anticipation of network down time and reduced service. If you plan to utilize an existing network, then you should monitor it to ensure that it has sufficient spare bandwidth to support the new application.
The car rental system will be employing client/server technology. The front-end will run on Microsoft Windows 95 and the back-end will be large UNIX servers. The servers will be symmetric multiprocessors so that throughput of large batch jobs can be optimized. We will be using a distributed database, with each rental site having a local database.
Identify potential bottlenecks in the proposed system
If you are ever told that performance doesn't matter in a computer system, take it with a pinch of salt, or temper the remark to mean that response time is not a critical success factor of the system. Better yet, ask for an explanation as to exactly what the remark means and why response time is not important. Quantify it if possible: Is a response time to an inquiry of an hour acceptable?
As we all know, the bottom line is that performance is always an important factor in any system. Every system will have a bottleneck somewhere. It is a fact of life that once you remove one bottleneck, you will just hit the next one. If you keep removing them, you will--with luck--eventually hit one that is acceptable.
Bottlenecks come in many forms. They can range from unacceptable load on a server due to too many online users running complex inquiries to a limit on a number of batch threads restricting the nightly throughput of jobs. The process of identifying potential bottlenecks can be made easier by drawing a graph of loads during a given time period--a day, a week, or a month.
In the car rental system, early indications are that transactions which need to be distributed across databases will be a major source of bottlenecks for us. We aim to partition and fragment our data across the network in such a way as to keep the number of distributed transactions to a minimum.
Review third-party products
Sometimes, developers and designers jump into the design and build stages of a system with both feet because they are so keen to get on and deliver something. We don't always pay enough attention to code that is already out there. There is an increasingly large number of third-party products on the market that run on an Oracle database. These range from entire suites of applications such as Oracle Financials and SAP, to small utilities such as DBA tools, batch schedulers, security add-ons, etc.
Always take the time to objectively evaluate off-the-shelf products against each other and against a home-grown solution. Appendix A, Off-the-Shelf Packages, examines in detail the types of challenges and problems that you may encounter with a purchased solution. If timescales and budget are tight, then a purchased solution may alleviate some of our worries. (Of course, it may also introduce new ones.)
In our car rental system, we decide to evaluate batch scheduling packages which provide more flexibility than the standard UNIX utilities cron and at. We look specifically at products which record the jobs and runs in Oracle tables. These will allow us to write our own reports against them and programmatically submit requests by simply inserting into a table.
Agree on design and build standards
You will need a set of standards covering such things as naming conventions for database objects, design documentation standards, mechanisms for raising change requests, and so on. You will also need standards that govern how you set out your SQL and PL/SQL and a specific standard for each tool or product you are using for development.
Before you take the time to invent your own standards, find out if your organization has its own set of standards and ask what standards previous projects have used. Many wheels are being reinvented in the world of standards. If you do adopt a set of preexisting standards, be sure to review them and adapt them to the specific needs of your project. You are allowed to reinvent (or at least to refine) a small part of the wheel!
In our scenario, Oracle is new to the organization so we have to develop standards from scratch. However, we find some standards on the Internet that are shareware, and using these as a starting point eliminates a lot of the hard work.
Consider use of CASE tools
You may consider the installation of a CASE tool at this stage. However, if the analysis of the system is not already in a CASE tool, then you will need to weigh the advantages of using CASE against the odious task of retrofitting the current analysis into the CASE product. Some CASE products will allow you to jump straight into design and "reverse engineer" back into analysis.
Despite the popularity of CASE, both of us have severe practical and theoretical reservations about the benefits of reverse engineering a physical database definition. It creates the unfortunate illusion that analysis has been recorded when in reality it almost certainly has not since the information contained within a design is different from that contained within an analysis. It is a little bit like claiming to be able to describe someone's personality by inspecting their wardrobe; you'll get some useful pointers but are unlikely to uncover the whole story.
In our example system, the analysis was captured in Oracle Designer/2000, and we will continue to use this product for the design.
Provide an infrastructure for design and a complete build environment
The design and build processes need hardware and software resources. They also need a mechanism to control all the documentation and code that is produced. We recommend that you deal with these issues in an early stage of the project so the facilities will be available as soon as they are required. Source code control is examined in Chapter 15, Introduction to Code Design.
Perform Database Design
The tasks described in this section focus on the actual database design. This is where we design the physical table and column definitions and indexes or clusters to support them, the constraints and triggers that enforce certain rules and integrity controls, and some of the database-resident code.
Ensure a consistent, implementable, and normalized information model
As database designers, we rely heavily on a good information model. The model produced in analysis should not contain any abstruse constructs that cannot be implemented in an Oracle7 database (more on this in Chapter 3). The model should be fully normalized to third normal form (3NF) or to Boyce-Codd normal form (BCNF). You should not need fourth or fifth normal forms (also covered in Chapter 3).
Produce a logical and physical data model
This is the major part of the database side of design. The conceptual model from analysis is translated into a logical design, then into a physical one. Next, a first-cut database is created for development purposes. The logical model will evolve throughout the entire duration of the design as decisions are reached and changes are made. As design progresses into build, the data model should become more stable as physical aspects such as file placement, indexes, and sizing are resolved.
Remember that database design cannot take place in isolation of application or module design. This is particularly true of Oracle7 where business rules may be created as constraints or other database-resident objects such as stored procedures.
The following list identifies the main tasks that need to be accomplished to deliver the working database; for each item in the list, we indicate the chapter in this book where the subject is discussed in detail:
- Identify unimplementable and unusual data constructs in the entity relationship model (ERD) or the entity definitions. (Chapter 3).
- Resolve all arcs and supertypes/subtypes (Chapter 3).
- Review all primary and foreign keys (Chapters 3 and 6).
- Design and implement database denormalizations to speed up query processing of any time-critical applications (Chapter 4).
- Determine which application processes are to be implemented within the database schema as stored procedures (Chapter 16).
- Identify and implement constraints to enforce referential integrity and selected data rules (Chapters 15 and 16).
- Design and build triggers to implement any centrally defined data rules or referential integrity rules that could not be specified as constraints (Chapter 15).
- Develop an indexing and clustering strategy (Chapter 6).
- Perform sizing estimates on all tables, clusters, and indexes (Chapter 9).
- Determine levels of users, and compose and implement a policy for security and audit. Create roles and synonyms to facilitate multiuser access with agreed-upon levels of access privilege (Chapter 10).
- Design the network topology of the database and a mechanism for allowing seamless access to remotely stored data (replicated or distributed database) (Chapter 12).
Create a development database
This is often the job of the development DBA, if you have one. If not, it usually falls into the designer's lap. The developers must have a physical database to develop against; designers need one to test out their ideas and theories. There will normally be several iterations of this database throughout the project and it needs to be strictly controlled. Chapter 15 looks at the issues surrounding source and version control.
For the car rental example, we create instances called DEV1, DEV2, and DEV3. These will be used on a circular basis for development database releases. This round-robin effect ensures that the developers who are in the middle of developing a module are not unduly disrupted by a new database release. We create a further instance called SYS_TEST.
Perform Process/Code Design
In parallel with the data model design, you will need to perform the process design (and ultimately develop your module specifications.) The two are heavily intertwined. The dividing line is especially thin with Oracle7, since this version of Oracle allows processing logic to be contained in the database.
The primary goal of the process design is to take the function definitions from the analysis and map them into modules that will be implemented as a single unit of source code (possibly including shared files such as C header files). These module definitions are then expanded into program specifications. This section describes the main tasks that we have to perform to achieve this goal.
Before you look at the list of tasks, recognize that some of the atomic functions from analysis will not map to a module at all. These will simply be translated into manual procedures or working practices, in other words, something outside of the computer system. Such procedures will have to be documented in user guides and it is important that you don't lose or overlook them during design. In most cases, these procedures are tasks that couldn't be implemented by a computer system. In some cases, however, we will decide during design to utilize a working practice to implement something that could be achieved on the system. This may be for convenience or simply to save development time.
Determine which build tools to use
You may come into design with no idea of what build tools are to be used. If so, draw up a short list of possible products, and put each one through evaluation. Delaying this decision can have a serious impact on module design and even data model design. Chapter 19, Selecting the Toolset, examines some of the currently popular tools for GUIs, screens, reporting, and ad hoc query.
Map functions to modules
We have to develop a list of functions that we will implement from the modules. This is by no means a one-to-one correspondence since the functions are organized by business functionality and we reorganize for ease of development. This means merging some functions that have common or similar functionality and in some cases splitting up a function which is overly complex or turns out to have areas in common with another function when we start designing it. Chapter 15 examines the mapping process in some detail.
In our ubiquitous car rental system, we have a function for working out an ad hoc bill and invoice for a single customer and a function for monthly billing. There is a lot of commonality between these tasks, so we will create a module named "Calculate bill for Customer" that can be called directly from a screen or from the monthly invoicing run. It is passed a customer ID and a pair of dates, and it passes back the amount that the customer is liable for over the given period.
We also have several interface functions that take on data from external systems by reading flat files. Each function has a requirement to open, manage, and read data files and to log errors. We decide to write generic modules to handle these common parts.
Determine program navigation
As part of module design, we map out menus, together with any other "hot key" mechanisms for navigating directly from one application to another. There are two basic types of direct navigation: with context and without context. Navigation with context means that the target screen is automatically populated with data related to that on the source screen. Navigating without context means that the target screen may not contain any data or may contain unrelated data.
In our car rental system, we devise a flexible navigation system mechanism. For instance, we allow the user to search for a customer record in one window and an available rental car in another; both can be copied into a "context area." With one approach, when the user brings up the "Rent Car" screen, the car and customer details are automatically populated from the context area. Alternatively, the user can go directly to the "Rent Car" screen with no context and perform searches to pull the details in.
Produce module definitions or specification
This is the major part of the functional side of design. During this step, you:
- Loosely translate the functional definitions of the analysis into implementable modules.
- Write a specification which articulates the functionality in physical terms.
- Identify the toolset for coding the applications, and map tools to individual functions.
For each module or code unit, you must develop a specification from which it can be built. These vary considerably in their detail and content from project to project, and even within a project. A module specification for a database-oriented program should, at the very least, list the tables that are accessed, indicating the type(s) of access (insert, delete, query, update, reference). Chapter 15 discusses what should and should not be in a module specification.
For the car rental example, we elect to use a combination of Pro*C, PL/SQL, Oracle Forms, and Oracle Reports as the comprehensive set of tools. We map the business functions into modules. For instance, "Check for cars requiring service" becomes the module "CRB0110: Produce a report of cars requiring service," which is classified as a simple Oracle Report. Because this module is so simple, it has a minimal specification and we estimate that it can be designed, built, and unit tested in four days.
Develop metrics for design and build
Metrics are simply templates for estimating how long a module of code will take to produce. They are covered in detail in Chapter 17, Metrics, Prototypes, and Specifications.
Examine the use of generators
Code generators can save us time and enforce a level of consistency in the code modules. If you have a large number of relatively straightforward modules to develop, then a code generator could significantly reduce your timescales. This somewhat controversial subject is covered in Chapter 15.
Develop template code
Template code is another technique for cutting out some of the tedious donkey work from module development. We use a prebuilt module as a starting basis for all new code that comes with some of the common functionality already built in. This technique is also described in Chapter 15.
Later Design Tasks
In this section we identify some of the tasks that can often be legitimately deferred until the later stages of the design phase of the project. In particular, we turn our thought to the planning of system test, taking on of external data, and some of the housekeeping tasks that we may need to specify.
Design for testing
Formulate an overall test strategy that lays out what types of tests you will run and when you will run them. Typically a unit test will be run when a single module is completed, related modules will be subjected to link tests, and the entire suite of applications will undergo a rigorous system test followed by a user acceptance test. However, there are many variations on this theme.
In short, it is the designer's responsibility to ensure that the acceptance test criteria set during analysis are met and that the system undergoes sufficient testing so that it is robust and suitable for production environment. Chapter 15 expands on testing methods.
In our sample car rental system, we plan our system testing around some real-life scenarios. These range from the very simple (customer rents a car) to the very complex (customer with loyalty discount wants to rent two compact cars, but we have only intermediate and sports cars in stock at this site).
We are developing our screens in Oracle Forms and we develop a skeleton unit test plan with a list of standard check points that ensure consistency. For instance:
- Go to the last field in each block and press the next-field key. The cursor should navigate to the first field of the next block or (if this is the last block) to the first field of the first block.
- Press the help key (F1) from any enterable field on the screen to ensure that the appropriate context-sensitive help is invoked.
Examine external and legacy systems and data feeds
How many new Oracle databases start off with a completely clean sheet of data, and begin with data entry on day one of their implementation? The answer is almost certainly very close to zero. Most organizations have existing data that is vital to their business and that they want the new application either to reference in place or to take on. Data take-on may be a one-off process, that is, one in which the data is taken from a legacy system that is being decommissioned and loaded into the new one. Alternatively, it may be an ongoing feed with periodic data transfer taking place. Data feeds are one of the major components of a data warehouse application. Chapter 8, Loading and Unloading Data, looks carefully at this entire subject.
The car rental system is replacing a legacy mainframe system and has a large once-off take-on of data. This is to be achieved with a series of flat file data loads. There is also an ongoing requirement to interface with the accounts system. The accounts interface will be achieved via a set of interface tables which are loaded from and unloaded to external files for transfer between the system. This interface needs to be as automated as possible.
Specify security, access, and housekeeping requirements
Every system has its own special requirements for security, audit, backup and recovery, event logging, archival, and so on. These requirements are very often not captured during analysis since they are often considered to be "physical" attributes of a system. During design, you must clarify these requirements and identify the mechanisms needed to implement them. Requirements of these kinds (those that are general across the entire system) are ideal candidates for implementation as objects within the Oracle7 database (triggers, stored procedures, etc.).
As part of the definition of security for the system, you must be sure to identify classes of end users at this time.
These topics are examined in detail in Chapter 10, Safeguarding Your Data.
The car rental system has a requirement that every financial transaction is logged. We will achieve this by recording details into an audit table using triggers on the tables which hold financial data. Some transactions, such as the issuing of refunds, will have to be approved by a suitably authorized employee before they can be released to the accounting system. We will implement a policy that requires users to change their passwords every four weeks. This will be made part of the login process.
Backup represents a problem for us since we have servers at many of our sites, but no operational staff to operate them. We decide to invest in some backup software that will automate the process so the only human intervention required will be the daily replacing of the tape.
Write functional specification and technical specification
You will package together many of the deliverables from design into a document called a functional specification. This document essentially gives the designers the interpretation of the requirements and the proposal of what the system will do. The main purpose of the document is to obtain a user sign-off on the design. For this reason, don't include in this document volumes of technical detail that the user cannot digest or even understand. Rather, make sure that this spec documents all the major design decisions and provides an overview of the database and modules.
Whereas the functional specification is for user consumption, the technical specification of the system provides an equivalent document for developers. This document is also a bundle of design deliverables, but it contains more detail. When CASE technology is employed, the technical specification may consist of a series of reports from the repository.
Ensure completeness of design
Before we begin the build operation, we need to make sure that the design is complete and of the quality required by the project. There are various checks that should be run at this point. A matrix of table usage against module is a useful starting point.
Into build and beyond
As soon as the build starts, the designers can tidy their desks, log off the system, and head off into the sunset confident in the knowledge of a job well done. No, unfortunately, they cannot. The role of the designer usually extends into build and often beyond. What do the designers do during these later stages?
Earlier in this book we pointed out that design is about making the best use of the technology. For this reason, designers need hands-on experience with current hardware and software. Most good designers like to keep their hands in as developers, and they will often form a part of the build team. This may frequently be in a more senior role, such as a team leader.
The following are some of the important ongoing input that the designers can have to a project during the build phase.
- When a module is handed over to a developer for coding, it is useful for the designer who wrote the specification to give a "paper" walk-through of the module and demonstrate any relevant prototypes.
- After the code is written, the designer can participate in code reviews, acting as the standards police and the design decision enforcement agency!
- Most projects are split into phases to make them more manageable, so large projects don't run for years without delivering anything. As development of one phase begins, there are likely to be further phases of analysis coming along and the cycle begins again. Now there are new constraints to ensure integration with the current phase and you will need to provide additional planning for migration from this phase to the next....Oh well, one day you might be out of there!
For More Information
- Feedback: E-mail the editor with your thoughts about this tip.
- More tips: Hundreds of free Oracle tips and scripts.
- Tip contest: Have an Oracle tip to offer your fellow DBAs and developers? The best tips submitted will receive a cool prize -- submit your tip today!
- Ask the Experts: Our SQL, database design, Oracle, SQL Server, DB2, metadata, and data warehousing gurus are waiting to answer your toughest questions.
- Forums: Ask your technical Oracle questions--or help out your peers by answering them--in our active forums.
- Best Web Links: Oracle tips, tutorials, and scripts from around the Web.