"With Oracle9i anything you can model in Java, you can model in Oracle. If just a few good programmers see the value of using this existing reliable infrastructure rather than recreating it, we can get on with the next paradigm shift, which is to model our business problems as objects instead of relationally mapped to Java objects. We're still stuck in the relational rut just as the hierarchical and network databasers of the past were stuck in a rut when relational technology first came out." --Quote of the week, Database Debunkings
"The only rules that should reside in a database are referential integrity (and sometimes [even] that isn't really necessary). It is also best to keep rules out of your data access code (hardcoding WHERE values). Business rules should be centralized in Java business objects for better manageability, scalability, etc. Don't let pushy DBAs tell you otherwise. Rules in a database slow down development as well as data access time." --Quote of the week, Database Debunkings
"Is there a standard Java implementation of the "relational model" (as described, e.g. by C. J. Date in INTRODUCTION TO DATABASE SYSTEMS), with classes representing the usual relational entities (tables, records, fields), and methods corresponding to the usual relational operations? We have data stored in flat files, and we would like to perform standard relational operations, such as joins, sorts, and generating views. I suppose that we could set up a database server (e.g. MySQL, Oracle), and use the Java database access API, but this goes against our goal to supply a simple stand-alone application. After all, our application does not need to access huge amounts of data, so accessing it via a traditional DB server is overkill anyway. Using flat files to store the data allows us to easily bundle the necessary data with the rest of the application. But still we would like to regularize the way the program manipulates the data internally, along the lines of the standard relational model." --alt.comp.lang.java.programmer
Among the regressive trends plaguing database practice, one gaining prominence is a return to the bad old days when data management was performed not by DBMSs and databases, but by application programs and files. A new generation of HTML/Java developers, who were not around to experience the plethora of problems caused by that approach, and who lack exposure to the concepts of database and data independence, are extending the object approach from programming to data management, for which it was not intended and is ill- suited.
Consider a representative article called (quite revealingly, as we shall shortly see) "How to Store Java Objects," by a team of academics from the University of Maribor in Slovenia. (Academia is not less likely than business to succumb to industry fads, see my editorials and Denormalization for Performance -- Et Tu Academia?).
"Java is becoming increasingly important in mission-critical system development. Advantages accruing from the development process compared to conventional programming languages are well known: Java is platform independent, allowing simple multi-platform system development with the opportunity to provide efficient and powerful data persistence."
Persistence is the revelation to programmers without a grasp of data fundamentals that it is a good idea for data to outlive programs using it. Wow!
"Implementing a robust persistent architecture is one of the most challenging tasks in designing a component-based application. Not only does persistence have a pervasive impact on the overall system structure, it also has a significant effect on development schedules and runtime performance of the resulting applications. Implementing persistence in object-oriented programs is especially challenging, because object-oriented programs typically consist of a large number of complex objects."
No kidding! You betcha it is mighty challenging! But not just because "programs consist of a large number of complex objects". Experience has taught that it is neither cost-effective, nor reliable to implement the gamut of data management functions -- data definition, integrity and manipulation, concurrency control, physical storage and access, query optimization, security -- in each and every application program, let alone modify programs when data change. That is why we migrated to databases, whereby those functions are implemented centrally in a DBMS for all application programs to share, leaving only application functions -- communication with users and presentation of results -- in programs. This is exactly what data independence is all about, and object-orientation does nothing to obviate its necessity.
"During development of the Hotel Information System, we were faced with the problem of choosing an appropriate database management system (DBMS). We had to evaluate various database management systems from the perspective of storing complex data. At the time, we had little reliable knowledge for choosing appropriate DBMS for storing objects. We had to devise an approach that would point out the most significant factors for choosing a suitable database."
The above mentioned database functions are the raison d'etre of a DBMS, so they should be at the core of product evaluations. But object, rather than data thinking leads practitioners astray. The focus on "storage of complex data" betrays misconceptions common in the industry. Foremost is a confusion between logical and physical levels of representation (see Comments on an Interview with Jim Gray and below). As I explain in Chapter 1 of Practical issues in database management, when it comes to "complex" data, the first and foremost issue is not how it is stored -- a physical implementation aspect -- but rather DBMS support of user-defined data types of arbitrary complexity -- a logical model aspect -- specifically, the representations, integrity constraints and operators made available to users for such types.
Note that the terms 'complex data' and 'objects' are used interchangeably. Implicit in this is another misconception that relational DBMSs (RDBMS) cannot support such data, only object DBMSs (ODBMS) can (see below).
"Performance evaluation is an important component in the process of selecting an appropriate database management system for a project. It is very difficult to compare different database systems because the optimization of the tasks depends on understanding the requirements of the application as well as the DBMS?s recommended sequence of proprietary API procedures. Factors that affect application performance range from the data model design and code construction, all the way to the frequency and the size of the commits. These aspects contribute to the complexity of the determining process for choosing an appropriate DBMS for a particular application.
Performance and API procedures are important aspects, but they are not substitutes for, and certainly should not take precedence over fundamental database functions. Indeed, it's the almost complete lack of attention to those functions that makes it "very difficult to compare different database systems".
The notion that "data model design" is a performance factor is doubly fallacious. First, it confuses enterprise-specific models -- business and logical -- which are designed by users, with the data model -- e.g. relational model -- which is neither enterprise-specific nor designed by users. The latter is a formal translation mechanism to map business models to their logical representations in the database (see Something to Call One's Own and The Name Game). It is because object orientation does not provide a data model analogous to the relational model, that object proponents and programmers don't distinguish between it and the business and logical models designed by users (see below).
Second, since by definition neither business, nor logical models are at the physical level, it is impossible for them to affect performance, another example of the logical-physical confusion (see The Logical-Physical Confusion and the articles on normalization in this series).
[Continued in Part II]
About the author
Fabian Pascal has an international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder and editor of Database Debunkings, a web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, where C.J. Date is a senior contributor. He has contributed extensively to most trade publications, including Database Programming and Design, DBMS, DataBased Advisor, Byte, Infoworld and Computerworld. His third book, "Practical issues in database management" (Addison Wesley, June 2000), serves as text for a seminar bearing the same name. He can be contacted at firstname.lastname@example.org.
For More Information
- What do you think about this column? E-mail the Editor at email@example.com with your feedback.
- The Best Web Links on the relational model
- Post your technical database questions--or help out your peers by answering them--in our live discussion forums.
- Ask the Experts! Our database design, SQL, Oracle, DB2, SQL Server, metadata, and data warehousing gurus will answer your toughest questions.