Q

Advantages and disadvantages of ndexed and hashed file access methods

Hi, I'd appreciate it if you gave an explanation to my questions below:

  • Individual records in a sequential file are often stored in an order defined by the contents of a key field (which may be alphabetical or numerical). What are the advantages and disadvantages of ordering a sequential file in this way?
  • How do indexed file and hashed file access methods locate a particular record in the file? What are the advantages of each of the methods (in which one method would be preferred to the other)?
  • What are the characteristics required of a a good hashing algorithm and a key field in a hashed file?


    The advantage of ordering records in a sequential file according to a key is that you can then search the file more quickly. If you know the key value that you want, you can use one of the relatively fast searches. The disadvantage is that when you insert, you need to rewrite at least everything after the insertion point, which makes inserts very expensive unless they are done at the end of the file.

    An indexed file approach keeps a (hopefully) small part of each row, and some kind of "pointer" to the row's location within the data file. This allows a search to use the index, which is ordered by the index and (again hopefully) much smaller and therefore much faster than scanning the entire data file for the indexed data.

    A hashing algorithm uses some of the data in the record to compute a "hash" value. This value is a unique or at least relatively unique value. The hash value determines where the record is stored in the file.

    Hashes are generally very fast. A simple algorithm will immediately determine the hash value for the record. This is both their strength, and their weakness. The strength comes from the speed and the fact that they don't need disk I/O to find anything. The weakness comes from the limitation of having only one hash value for a record. If you need to order the list by employee number, last name, and first name, you've got quite a problem. Note that nothing prevents you from both hashing and indexing, but that adds a lot of extra I/O to the process of modifying the file.

    A good hash algorithm needs to provide relatively unique hash values. If you get too many rows that produce the same hash value, the "collisions" will drastically slow down the performance of both inserts and searches. Another requirement is that while minimizing collisions, the algorithm needs to keep the hash range small, which means that the hash output values are relatively tightly clustered.

    For More Information


This was first published in April 2001

Dig deeper on Oracle database design and architecture

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchDataManagement

SearchBusinessAnalytics

SearchSAP

SearchSQLServer

TheServerSide

SearchDataCenter

SearchContentManagement

SearchFinancialApplications

Close