
Big Data Big Five
In this week’s BDB5 roundup, we HDFS takes a beating as companies and collaborations look to find ways to work around the maligned file system. We start with Intel who is looking to extend its recently acquired Lustre file system into its own Xeon-based Hadoop distro. Red Hat and Hortonworks get into the game with a collaboration aiming at broadening the file system ecosystem for Hadoop through the Apache Ambari project, and more…
Intel Extends Lustre File System Into Hadoop
Following the Intel acquisition of Whamcloud last year and the Lustre file system that came along with it, Intel announced this week that they will be extending the Lustre parallel distributed file system to work with its distribution of Apache Hadoop.
According to former Whamcloud Founder and CEO, Brent Gourda, a new Java class has been developed which allows users of the Intel Hadoop distro to fully swap Lustre in to replace HDFS. Gourda says the move culminated from talks with users looking into Hadoop from the high performance computing space, where they were looking to minimize the triple replication storage costs of HDFS.
While the file system switch might not be widely adopted with lower end implementations, it was reported that a company in the compute and data intensive oil and gas space did the math and found that replacing HDFS with Lustre would provide substantial savings in storage costs.
The announcement comes as a part of a larger announcement that Intel will be “mainstreaming” the high end file system that has gained popularity for HPC cluster systems. The system will be given ease-of-use enhancements that they say will make the file system approachable for anyone who can manage a Linux box.
NEXT – Cray Launches Hadoop on CS300 Iron — >
Cray Launches Hadoop on CS300 Iron
Supercomputer company, Cray, has announced that starting later this month, they will be distributing the Intel distribution of Apache Hadoop along with their Cray CS300 line of cluster supercomputers.
“We are combining the supercomputing technologies of the Cray CS300 series with the performance and security of the Intel Distribution to provide customers with a turnkey, reliable Hadoop solution that is purpose-built for high-value Hadoop environments,” said Bill Blake, Senior Vice President and CTO of Cray.
In the CS300 line, Cray offers an air cooled and a liquid cooled version of their cluster systems, which include the Linux operating system, workload management software, the Cray Advanced Cluster Engine (ACE) management software, and now, the Intel Apache Hadoop distro.
Cray says that this move aims to address big data challenges in the newly forming market that IDC calls High Performance Data Analysis, which they classify as simulation and analytics-based data analysis complex enough to require the use of high performance computing (HPC) methods of resources (on-premise or in the cloud).
NEXT – Altiscale Launches Hadoop Full Service Offering — >Altiscale Launches Hadoop Full Service Offering with $12 Million Series A
Ex-Yahoo CTO, Raymie Stata, unveiled his new company this week named Altiscale, which aims to offer Hadoop as a service for organizations looking to get into the Hadoop framework without the overhead of hiring a team to install and manage the clusters. As reported by Derrick Harris, Altiscale had recently closed a $12 million Series A round of funding from a group of investors including Sequoia Capital, General Catalyst Partners, Accel Partners, as well as Jerry Yang’s AME Ventures and a collection of individual investors.
According the report, that company will aim to recruit customers who are already current Hadoop users, but are looking for better ways to consume the framework. Stata says that there is a group of underserved users who got stuck on first or second gear with their Hadoop installations and can benefit from a HaaS offering.
According to the report, the company plans on differentiating itself through a billing plan that is more forgiving than current offerings available in the market, as well as by providing complete management responsibility of the Hadoop cluster, allowing their clients to focus on the questions they want answered rather than wrangling with the cluster.
NEXT – Hortonworks, Red Hat Collaborate — >Hortonworks, Red Hat Collaborate to Expand Hadoop Compatible File Systems
The Hadoop HDFS file system got another black eye this week as Hortonworks and Red Hat announced that they will be extending their partnership into an engineering collaboration aimed at expanding the file system ecosystem for the Apache Hadoop framework. Central to this partnership is the enhancement of the Apache Ambari project to increase the breadth of storage offerings and support multiple Hadoop-compatible file systems.
Through the partnership, the companies say that they will provide integration, provisioning, and deployment tools for alternative file systems through the Ambari tool, as well as provide a test suite to validate compatibility for these systems.
The partnership will also work to integrate the Hortonworks Data Platform with Red Hat’s POSIX-compliant storage offering.
Next – VMware Aims at Machine Data Space — >
VMware Aims at Machine Data Space With Log Analysis Tools
VMware this week launched a software suite aimed at carving out their piece of the “Internet of Things” log analysis pie. Named vCenter™ Log Insight™, VMware says that the new tools will allow organizations the ability to simplify log management and enable better operational efficiency in cloud environments.
The tool is said to be purpose built for log analytics, providing automated log management & aggregation, analytics, and search for systems monitoring and other management tasks. It will crunch data from the broad range of IT infrastructure (applications, firewalls, OS and storage systems, virtual machines, etc.) and provide dashboards and reporting tools to give insights into their operations.
The project is currently in public beta, and is expected to be generally available in Q3 2013, at the price of $200 per operating system instance with no log data size limits.