Most middle-tier applications today rely on database management systems for data storage. Databases provide a rich set of functionalities, such as transaction and security management and data warehousing features to ensure data persistence, integrity and usefulness. Java 2 Platform, Enterprise Edition (J2EE) applications in particular have an abstraction layer, namely the Entity Enterprise Java Bean layer, that specifically addresses the mapping of application objects to database tables.
What is a data source? In the generic sense, a data source is simply a place where information is being stored. For example, a database is a data source. Other systems can be data sources, too, such as ERP applications or legacy systems like CICS. This paper primarily focuses on the data source concept in the database world. However, for the other types of systems mentioned above, the concepts are similar.
In the J2EE world, data sources are actual Java objects that represent the physical data storage systems, such as databases. From these objects a J2EE application can retrieve JDBC connections to the databases being represented. This layer of abstraction provides benefits such as connection pooling, configuration flexibility and application portability.
This paper defines in detail what data sources are within the scope of J2EE, discusses the benefits of using data sources, describes how to set up and use a data source, explains connection retrieval behavior from data sources in the context of transactions, and illustrates the different ways security principals can be configured for the data sources.
Types and benefits
In the J2EE world, a data source is a resource manager connection factory for java.sql. Connection objects that implement connections to a database management system. In other words, a data source abstractly represents a physical database and one can obtain a connection object to the physical database from this data source object. A data source implements the java.sql.DataSource interface and is part of the J2EE specification. All J2EE compliant application servers must provide a J2EE compliant implementation of data sources.
Data sources allow application components to access physical connections to the underlying database management systems through logical mappings. These mappings provide resource manager connection factory references –- logical references of connection factories for databases. These references are configured in XML files. They provide an abstraction layer between the application components and the databases these components connect to. Specific properties of these databases (physical location, port number, instance name, etc.) are configured in this abstraction layer, rather than being embedded in the application code.
This abstraction layer allows the application code to be independent of the setup of the databases. Without using data sources, the driver manager, JDBC URL, username and password for the database instance would either need to be hard coded in the application code, stored in proprietary initialization files or environment variables, making the application highly inflexible and non-standard. If the RDBMS is relocated to another machine or if the password changes, the application code may need to be changed and recompiled, which is very disruptive for the production environment. Data sources remove this dependency and provide a standard way of configuring and initializing JDBC connection parameters. One would simply change the data source entry in the XML configuration file. No recompilation of code is necessary in this case.
There are several kinds of data sources, each implemented differently, providing a distinctive set of benefits. The following sections describe in detail these different kinds of data sources, including basic, pooled, XA-enabled and EJB-enabled.
This is the most basic type of implementation of a data source. Besides providing the abstraction layer between application code and the physical database, it does not provide any added functionalities. Connections obtained from this type of data source do not come from a connection pool, they are simply created on-demand and destroyed once the connections are closed. This type of data source is ideal for development purposes due to its nature. In a production environment, usage of basic data sources should be avoided.
This type of data source implementation is connection-pooling aware. Database connections obtained from this type of data source are automatically pooled. They are retrieved from a pool and are returned back to the pool when closed, rather than being created and destroyed on demand.
The pool size and behavior can be configured so that system resource usage is controlled by the setup of the pool rather than being dependent on the runtime behavior of the application itself. For example, Oracle9i Application Server allows the following attributes to be configured:
- Maximum pool size
- Minimum pool size
- Maximum number of attempts to get a JDBC connection before time out
- Maximum period of time of inactivity before the JDBC connection times out
- Maximum period of time to wait for a JDBC connection before time out
Connection pooling improves application performance drastically by reusing connections that have already been previously created. By doing so, the overhead of creating the connection and associated session is amortized across its multiple uses. This is a significant savings in time and resource usage since it typically takes more than one second to create a new database connection.
This pooling mechanism enables the usage of system resources, such as memory, to be predetermined. Without connection pooling, system resource usage is completely dependent on how many users are accessing the application during runtime. JDBC connections will be created and destroyed on demand, making the behavior of the application highly unpredictable. If too many users try to access the application at the same time, it may result in server crash due to memory shortage. In the case of connection pooling, a time out will typically be returned to inform the user to try again at a later time.
The XA-aware data source implementation allows connections to participate in distributed transactions. This gives the middle-tier application the ability to commit a single transaction across multiple databases. If an application needs to participate in a two-phase commit, for example, this type of data source must be used.
When using JDBC connections obtained from an XA-enabled data source, one should never call the commit and rollback methods on the connection objects directly, since the transaction manager should always be the coordinator of global transactions. These methods should be called by the transaction object instead.
Since the XAConnection interface is derived from the PooledConnection interface, XA-enabled connections are pooled by default. So connections obtained from XA-aware data sources have all the connection pooling benefits that were described in the section above.
The EJB-aware data source implementation typically is connection pooling and XA aware. The EJB container within the application server uses this data source implementation for container-managed persistence (CMP) entity EJBs. For CMP entity EJBs, the application developer does not explicitly open and close JDBC connections or execute SQL statements within the application code. Instead, the object-relational mapping information is configured in deployment descriptor files, leaving the responsibility of creating and executing the appropriate SQL statements to the J2EE container. The J2EE container will use the EJB aware data source type to obtain JDBC connections to execute these generated SQL statements.
JDBC connections obtained through the EJB-aware data source type are both pooled and XA enabled, thus it has the complete set of benefits that data sources have to offer. In a deployment environment, this type of data source should generally be used.
Data sources are configured in application-server-specific configuration files, typically in XML format. The J2EE specification has made XML the required format for server and application configurations. For example, in the Oracle9i Application Server data sources are declared and configured in the data-sources.xml file. In BEA WebLogic, they are configured in the config.xml file. Here is an example of a data source entry in the data-sources.xml file of Oracle9i Application Server:
data-source class="com.evermind.sql.DriverManagerDataSource" name="OracleDS" location="jdbc/OracleCoreDS" pooled-location="jdbc/OraclePooledDS" xa-location="jdbc/xa/OracleXADS" ejb-location="jdbc/OracleDS" connection-driver="oracle.jdbc.driver.OracleDriver" username="scott" password="tiger" url="jdbc:oracle:thin:@hostname:1521:oracle" inactivity-timeout="30" wait-timeout="30" max-connect-attempts="5" max-connections="100" min-connections="30"
Notice that each data source is declared within the "data-source" tag. The data-sources.xml file can have multiple data source entries. Within each data source entry, properties of the data source are specified, including the JNDI locations of the different data source implementations, the JDBC connection properties such as the URL, username, password and other attributes, such as the driver manager class to be used, the data source class to be used, etc. Many of these attributes, such as the driver manager class, the JDBC url, the username and password would have been embedded in the application code if data sources were not used.
In the previous section, we discussed the different types of data source implementations that are typically provided by the application server vendor. These different implementations can typically be declared in a configuration file. In the above example, each implementation is given a designated JNDI entry name, so it can be retrieved from the JNDI tree.
Application server vendors usually provide a GUI-based tool to configure data sources and other configuration parameters, so the deployer does not need to modify the XML file directly.
The data sources declared in the configuration file are each assigned an entry in the application server's naming environment. This assignment typically happens at application server startup time. Once the data sources are enlisted in the naming environment, they can be looked up and retrieved through the Java Naming and Directory Interface (JNDI). For example, the EJB data source implementation type in the previous example can be looked up in the following manner:
Context ic = new InitialContext();
DataSource dataSource = ic.lookup("jdbc/OracleDS");
To retrieve a JDBC connection from the dataSource object obtained above, simply do the following:
Connection connection = dataSource.getConnection();
When the connection is no longer needed, simply close it:
It will be then returned back to the connection pool to be used by another application component.
We have covered the basic concepts of data sources in the previous sections. So far, we have discussed what data sources are, what are the typical implementations that are provided by application server vendors, each of their benefits, and how are data sources typically configured in a J2EE application server environment.
In the following sections, we are going to delve into some advanced topics, namely connection shareability and security principal management. These two topics are important concepts for developers to understand for J2EE application design.
The J2EE 1.3 specification mandates that by default, JDBC connections are shareable across application components that use the same resource and are in the same transaction context. In other words, within one transaction context, if DataSource.getConnection() is called multiple times, the same instance of connection object is returned each time.
This default behavior of connection sharing allows J2EE application servers to optimize the use of connections and enable local transaction optimization. If connections being retrieved in the same local transaction are unique instances (in other words, when the connection sharing setting is turned off), multiple connections to the same database are opened within the same local transaction. This consumes significantly more system resources than if only a single connection is created (which would happen in the connection sharing mode). The end result in both cases are the same -- when the local transaction commits or rolls back, everything that was done within the transaction will commit or roll back accordingly, no matter if it is done through a single or multiple connections to the RDBMS system. Thus, in most cases, it makes sense to use the connection sharing functionalities.
Application developers need to be cognizant of the shared connection retrieval behavior when writing code. For example, it is generally good practice to call the commit and rollback methods on the transaction objects, rather than on the connection objects, so information is not accidentally saved, updated or deleted from the database.
This default behavior of connection sharing can be changed by specifying the value of the res-sharing-scope deployment descriptor element to be unshareable.
Security principal management
The sign-on information needed to access a resource manager, including the username and password, can be provided in two ways:
- The most commonly used method is in the data source configuration file (as shown in the previous example of a data source setup) since it provides greater flexibility for the deployment environment.
- It can also be provided in the application component code. In this case, if the password and/or username needs to be changed, application code will need to be recompiled.
In this paper, we first covered the basic concepts of data sources, including the different types of implementations and J2EE data sources and their associated benefits, as well as how data sources are typically configured within a J2EE application server. Essentially, data sources provide an abstraction layer between the application components and the databases they connect to. This layer allows the physical attributes of the production environment to be separated from the application code that is being deployed, allowing the J2EE application to be more portable and configurable. This layer also provides added functionalities such as connection pooling, distributed transaction support and connection sharing within a single transaction.
Using data sources to obtain JDBC connections to databases is a better practice than creating JDBC connections directly in application code as it allows application components to leverage the functionalities that are provided through the data source layer. After reading this paper, you should be able to list the different kinds of data sources that are typically provided by an application server vendor, each of their benefits, as well as how and when should they be used.
About the author: Brooke Zhou is a product manager, Oracle9i Application Server. Her responsibilities include sales force training, supporting strategic customers to ensure successful product deployment, and driving product direction to continually improve the quality of Oracle9i Application Server. Prior to joining Oracle, Brooke has worked at Oven Digital as a technical lead for J2EE application development and at Hewlett Packard as a hardware design engineer. Brooke Zhou holds a B.S. in Electrical Engineering and Computer Science from MIT.