Although Hadoop broke ground a decade ago, a significant number of organizations have done little more than think about a big data initiative, such as creating a Hadoop enterprise data warehouse (EDW). If you’re one of those organizations, you’re not alone. A recent SD Times article reported that 40% of respondents say that their Hadoop (Big Data) projects are still “under the desk.” Because of all of the hype surrounding big data, you may feel the pressure to jump in. However, before you do it’s important to consider some potential roadblocks, and ensure you have a plan.
First, there is the understandable fear of change. You have hundreds of man years and millions of dollars invested in relational databases, customized schemas, and data models for your domains. Your IT team has security protocols in place that have to be translated to the new environments. And you’re worried that transferring data to a NFS (Network File System) or a HDFS (Hadoop Distributed File System) will compromise the integrity and security of your data stores.
Second, your organization lacks the personnel and skill set. Any big data initiative may languish as you try to put the capital together to hire data scientists, and cluster administrators.
Third, you, like many others, may be wondering why you need a big data application (and may be afraid to admit that you just don’t know). You have the day-to-day of running your business to focus on, not the mining of text, social media, and web searching, which characterize many big data applications.
We’re here to offer assurances and help you plan
You have very valid fears around the costs and complexity of overhauling your data operations. The value for “going Big”, just may not be apparent. The best strategy is to start small. For example, Hadoop is ideal for storing and processing very large data sets. The low-hanging fruit for virtually any organization will be to use Hadoop to perform ETL (extract, translate, and load). If you can offload some of these tasks to the Hadoop EDW, you may start seeing returns earlier than you ever thought possible. For inspiration, look to the many online resources for case studies.
Hadoop has reached a level of maturity over the last decade and a large support community has grown around it. There are many resources available to help you get a small, proof of concept project off the ground, without the requirement of hiring additional staff at the onset. There is a small learning curve, but not nearly as steep as the learning curve for outside Hadoop experts to learn about the banking business, for example. Java programmers should have no problems with Hadoop, which is object oriented and written in Java
While making the decision to “go Big” may seem difficult, if you understand the potential roadblocks and have a plan, the benefits to your organization could be great. As you start to gain experience, additional questions will be easier to answer.
Remember, there are plenty of resources available to help advance your Big Data journey. Rogue Wave’s JMSL Numerical Library, for example, embeds easily within Hadoop, turning MapReduce applications into advanced analytic applications with very little additional effort.
There are also newer projects built on top of Hadoop, PIG, HIVE, and IMPALA, as well as other Apache open source projects like SPARK, not to mention the successful commercial distributions, Cloudera, Hortonworks, and MAPR, that can help bridge the gaps.
Are you ready to start your big data initiative?
– Read more about using JMSL in Hadoop MapReduce applications