IT departments are usually organized in silos that include database administrators, network engineers, systems administrators, storage administrators and others. Oracle Exadata, because of its integrated nature, has posed a challenge for those enterprise IT operations.
It didn't take long for customers to learn that Exadata required a unified team that was responsible for the complete database machine. As companies start using Hadoop for big data projects, they will likely run into the same quandary.
With Exadata, traditional IT operations silos either couldn't work together or were too inefficient, which is the opposite of the simplified operations that Oracle was promising. Instead, most customers adopting Exadata achieve simplified operations by creating -- or outsourcing to -- a team of Exadata administrators or database machine administrators (if I can borrow the term coined by Arup Nanda). Being a preintegrated, engineered system with built-in management and automation software, Exadata required core DBA skills along with some network systems and storage management knowledge.
Hadoop for big data also challenges the siloed skills of enterprise IT organizations and require even more specialized skills than Exadata, including deep knowledge of hardware, networking, storage and Linux, as well as core data management skills to build and operate the cluster. Furthermore, successful big data project implementations require the operations team to work very closely with development, data scientists, data wranglers and business domain experts.
The reasons these organizations need one cohesive team is that big data computing requires all components of the system to operate at maximum efficiency and, of course, be effective. We are talking about thousands of nodes and tens of thousands of processors and spinning rust. The cost of just 10% inefficiency is significant -- even if you're only accounting for excessive power consumption.
Think of your Hadoop cluster as a data supercomputer. Components of Hadoop for big data must work in unison to be effective. A single inefficient component quickly becomes a bottleneck and drags down the capabilities of the whole system. Moreover, team responsibilities go beyond just the Hadoop stack and include a variety of management components and integration points with external systems.
Big data is still in its infancy. Today, and over the years to come, the pace of change in the core technologies and applications will remain high. Data warehousing projects have phases of business analysis, data modeling, ETL development, report development, data mart development and operational support. But a project using Hadoop for big data has all these phases running simultaneously and continuously with team members in all skill areas fully engaged and closely collaborating. To succeed in their big data projects, enterprise IT will need to form comprehensive teams able to tackle all aspects that big data encompasses
About the author:
Alex Gorbachev is CTO of Pythian, a global data management consultancy specializing in planning, deploying and managing data infrastructures. Previously he developed and administered Oracle databases and applications for Lukoil, the Russian petrochemicals giant.
This was first published in April 2013