Hi, I'd appreciate it if you gave an explanation to my questions below:
- Individual records in a sequential file are often stored in an order defined by the contents of a key field (which may be alphabetical or numerical). What are the advantages and disadvantages of ordering a sequential file in this way?
- How do indexed file and hashed file access methods locate a particular record in the file? What are the advantages of each of the methods (in which one method would be preferred to the other)?
- What are the characteristics required of a a good hashing algorithm and a key field in a hashed file?
The advantage of ordering records in a sequential file according to a key is that you can then search the file more quickly. If you know the key value that you want, you can use one of the relatively fast searches. The disadvantage is that when you insert, you need to rewrite at least everything after the insertion point, which makes inserts very expensive unless they are done at the end of the file.
An indexed file approach keeps a (hopefully) small part of each row, and some kind of "pointer" to the row's location within the data file. This allows a search to use the index, which is ordered by the index and (again hopefully) much smaller and therefore much faster than scanning the entire data file for the indexed data.
A hashing algorithm uses some of the data in the record to compute a "hash" value. This value is a unique or at least relatively unique value. The hash value determines where the record is stored in the file.
Hashes are generally very fast. A simple algorithm will immediately determine the hash value for the record. This is both their strength, and their weakness. The strength comes from the speed and the fact that they don't need disk I/O to find anything. The weakness comes from the limitation of having only one hash value for a record. If you need to order the list by employee number, last name, and first name, you've got quite a problem. Note that nothing prevents you from both hashing and indexing, but that adds a lot of extra I/O to the process of modifying the file.
A good hash algorithm needs to provide relatively unique hash values. If you get too many rows that produce the same hash value, the "collisions" will drastically slow down the performance of both inserts and searches. Another requirement is that while minimizing collisions, the algorithm needs to keep the hash range small, which means that the hash output values are relatively tightly clustered.
For More Information
- What do you think about this answer? E-mail us at editor@searchDatabase.com with your feedback.
- The Best Database Design Web Links: tips, tutorials, scripts, and more.
- Have a Database Design tip to offer your fellow DBA's and developers? The best tips submitted will receive a cool prize--submit your tip today!
- Ask your technical Database Design questions--or help out your peers by answering them--in our live discussion forums.
- Ask the Experts yourself: Our Database Design guru is waiting to answer your toughest questions.
This was first published in April 2001