Q
Problem solve Get help with specific problems with your technologies, process and projects.

Hashing functions explained

What is meant by "hashing function"? How is it used in hash data clusters and hash joins?

A hashing function is typically a mathematical function which takes a key value and determines a position in an array. For instance, we might have a table which contains employee names. If we needed to find out if an employee is in the table, we would have to search the entire table. It would be nice if we had an idea of where that employee record might be. So a hash function can be used to determine, based on the employee's name, the position in the array, or table. Check the array in that position and see if the record is there.

Hash functions have a big science behind it because as you may already be guessing, two different keys can "hash" to the same position. If this occurs, then a collision has happened. How do you handle the collision? Hashing also leads to lots of wasted space where no keys can map, or hash to. So this can be quite a topic.

How does hashing relate to hash joins? In hash joins, we will be joining two tables. We read the first table and apply a hash function to each record in the table. These records are then stored in a number of buckets. We then take the records from the second table and apply the same hash function. This will tell us which bucket to look in. Is there a row in this bucket from the first table to join to the row in the second table? If so, it will be in that bucket. This is the basic concept behind hash joins.

When I went to graduate school, I wrote my master's thesis on an extension to the hash join. You can view a copy of this thesis on my old school's Web site (http://www.cs.ndsu.nodak.edu/~peasland/paper.doc). If you are interested in learning about hash joins, then read the first few chapters and they give nice diagrams on how hash joins work.

This was last published in April 2003

Content

Find more PRO+ content and other member only offers, here.

Have a question for an expert?

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

SearchDataManagement

• How AI and IoT will influence data management in 2018

AI and IoT will alter the data management landscape in 2018, according to analyst James Kobielus. AI will need regular updates, ...

• Apache Hadoop 3.0 goes GA, adds hooks for cloud and GPUs

Is this the post-Hadoop era? Not in the eyes of Hadoop 3.0 backers, who see the latest update to the big data framework ...

• Expert: For BI, you must know the data integration process

Understanding the data integration process is central to self-service BI and data architecture design, consultant Rick Sherman ...

• Infographic: The evolution of the chief data officer role

The CDO role, which has never been rigidly defined, is undergoing a face-lift as emerging technologies present new opportunities ...

• Predictive analytics projects can bolster business decisions

Blind faith in predictive models can result in flawed business decisions. Analytics teams need to manage predictive processes ...

• How predictive analytics techniques and processes work

Predictive analytics is no longer confined to data scientists and other highly skilled analysts. But other users need to ...

SearchSAP

• SAP S/4HANA Cloud and indirect access will dominate 2018

Industry experts say SAP S/4HANA Cloud migrations, Leonardo and Cloud Platform are the technology issues for SAP in 2018; on the ...

When a Dutch energy grid provider needed to develop new business apps on top of SAP ERP, it turned to the Mendix RAD platform to ...

• SAP's Timo Elliott on enterprise chatbot AI technology

The SAP global innovation evangelist expects AI to affect businesses in three ways: human-computer interaction, automation of ...

SearchSQLServer

• Meltdown and Spectre fixes eyed for SQL Server performance issues

Microsoft has responded to the Spectre and Meltdown chip vulnerabilities with patches and other fixes. But IT teams need to sort ...

• Five SQL Server maintenance steps you should take -- ASAP

Putting off SQL Server administration tasks can lead to database problems. Enact these often-neglected maintenance items to help ...

• Microsoft Cosmos DB takes Azure databases to a higher level

Azure Cosmos DB brings a new element to the database lineup of Microsoft's cloud platform, offering multiple data models and a ...

TheServerSide.com

• Spring creator Rod Johnson releases API for implementing DevOps

Find out about Rod Johnson's latest project, which promises to simplify the process of implementing DevOps and streamlining the ...

• What Java developers need to know about TypeScript syntax

For Java developers transitioning into JavaScript frameworks, like React and Angular, this TypeScript tutorial on syntax will ...

• AWS Cloud9 IDE threatens Microsoft developer base

With its Cloud9 IDE, AWS challenges Microsoft where it matters most -- with the developer community, where Microsoft has ...

SearchDataCenter

• Three requirements for a hybrid cloud computing deployment

As the hybrid cloud computing approach gains steam, organizations will need to pay close attention to cross-cloud connectivity ...

• Five debunked myths about SSD issues

Solid-state drives are mature now, and the technology has eclipsed hard disk drives with superior performance, manageability and ...

• Future data center trends hinge on the edge, cloud and staffing

Edge computing, colocation, cloud and IT staffing issues lead the way as industry analysts make their predictions for the future ...

SearchContentManagement

• Agile content management leads to iterative value

An Agile approach to enterprise content management enables companies to continually improve ECM systems and add value steadily ...

• Intelligent information management the next wave for ECM

In a 2018 upgrade, M-Files allows users to search for content in multiple repositories, while also being able to automatically ...

• SharePoint integration and implementation best practices

Here are some expert advice and tips, as well common definitions, to help make your SharePoint integration and implementation a ...

SearchFinancialApplications

• Finance IT case study: Reporting secrets of Derek Rose

CEO Sacha Rose says specialist reporting tools have saved the company thousands by avoiding unnecessary mistakes.

• WestJet turns to gamification to help its Oracle ERP users soar

WestJet's initial gamification project focuses on expense reporting.

• The Transformation of HR is Underway

HR is being transformed while we watch.

Close