Problem solve Get help with specific problems with your technologies, process and projects.

Wide or deep?

I'm going to develop a database application with MS SQL Server 2000. The data is coming from an Access 97 application and consists of a series of MDBs designed with a large number of fields and tables (MDBs of 250-300 fields divided into 50-60 tables). All data refers to and is related to a main table. It essentially contains several specifications of a main photographic object (author, date, photographer, copyright, format, analitic description and so on).

I tried to make some normalization to the data, but in every solution, the number of fields and tables remained unacceptable in terms of number and maintainance. So I changed the perspective into a "vertical" solution with simply four tables (necessary to maintain many-to-many relations) and fields transformed into records.

Is this solution the best for my problem? What are the pros and cons of such design?

You've run across one of the classic problems in database design. The question boils down to whether it is better to design the schema "wide" (many distinct columns) or "deep" (reusing generic columns as much as possible). The general consensus among database experts seems to have settled on the wide approach, as long as it can work within the limits of the database engine. The wide approach generates many more columns, some of which are rarely used, but it is almost always better documented and easier to understand.

One of the most important things that a database engine brings to application design is the separation of the data and the rules that manage it (tables, columns, keys, constraints, etc) from the applications that use the data. This means that new applications can be written to share and/or manipulate data without having to study every application that already uses the data. It separates the data integrity rules (at least for the most part) from the application by allowing those rules to be expressed as constraints, so that everybody plays by the same rules. This separation has done wonderful things for the applications and users that have made the most of it... The applications become simpler, and so do the lives of the users!

You described the number of columns and tables in your schema as unacceptable, but you never clearly spell out why they are unacceptable. Depending on the level of detail you are using, 50 to 60 tables could be fine. If each table contains information that describes only one class of things (cameras, photographers, sessions, etc), and each row in each table describes exactly one thing: one camera, one photographer, one shooting session (which might have several cameras and/or photographers!), then the schema is as simple as it can be. If the schema tracks 1000 individual attributes (columns), that is OK.

Converting to the "vertical" solution that you described essentially goes back to the days when each application defined how it was going to use files for itself. Without the application, the files alone were useless because too much information about the data in those files was buried in the application, making the data stored in the files inaccessible. Because the application "knows" what rows mean in which order, it can manipulate them, but only that application can safely manipulate them!

Most of the applications that I work with on a regular basis have over 100 tables and 1200 columns. Several of the purchased applications carry that to an extreme and have over 1200 tables and 18,500 columns! If you need 60 tables and 300 columns to describe your photography to the level of detail that you need, I certainly would not worry about it!

For More Information

Dig Deeper on Oracle DBA jobs, training and certification

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.