Blog: Big Infrastructure Everywhere

Big Data runs on Big Infrastructure. HPC runs on Big Infrastructure. In fact, most larger enterprises today run some form of big infrastructure. They mostly use turnkey hardware/software appliances from the likes of TeraData, EMC, and Oracle. It reminds me of the early days of High Performance Computing, where everyone used supercomputers from Cray, NEC, and IBM.

Today, enterprises are looking to follow the lead of hyperscale web properties, blazing a path away from specialized hardware towards commodity hardware and open source software in their search for increasingly scalable, less expensive ways to solve complex data and analytics problems.

Unfortunately, building and managing these systems isn’t easy. Enterprises are trying to emulate the success shown by Netflix with Cassandra in EC2, Facebook with Hadoop in-house, or Craigslist with MongoDB, but they run up against a significant problem. They simply don’t have the development resources and open architecture guidelines you need to make these solutions work.

Without the controlled environment that comes with tightly-coupled, turnkey infrastructure purchased from a large IT solution vendor, today’s adventuresome IT users have to fend for themselves as they deploy, and manage the lifecycle of a complex array of hardware and software.  Hyperscale web properties often develop their own management tools that provide some level of automated deployment, configuration, and change management, but that takes software development resources many enterprise IT departments don’t have. But they still need automation tools if they want to achieve the results they’re after. Something that will harness that collection of commodity hardware and open source software to behave more like the turnkey environments they are used to.

What would such a tool set look like? What attributes would it have? What form should it take? I think these are the “must-have” capabilities to look for:

1. Support heterogeneous hardware thereby allowing the use of truly commodity hardware
2. Allow provisioning from bare metal to ensure complete and consistent stack deployment
3. Provide parallel, rapid provisioning enabling quick installation and changes
4. Be able to encapsulate and parameterize popular open source software solutions allowing configuration free deployment

What do you think? Come chat with some of the StackIQ team at Hadoop World in New York on November 8th-9th (email info@StackIQ.com if you’d like to connect with one of us there), and at Supercomputing 2011 on November 14th-17th (visit us at booth #6209).

-Tim
@timmcintire
@StackIQ

This entry was posted in Blog. Bookmark the permalink.

One Response to Blog: Big Infrastructure Everywhere

  1. Jim Kaskade says:

    Nice Tim. “Big Infrastructure” also reminds me of the days of enterprise data warehouse and now big data, of course.

    HPC, EDW, Big Data…all need seamless support for all constituents….from IT all the way up to business users.

    A “turnkey environment” is not only the “infrastructure layer” but also the “developer layer” and “business user layer”.

    In cloud computing…we’re seeing this being defined as IaaS, PaaS, and SaaS.

    On the infrastructure side, I’d add things like:

    *Reliability / Availability (HA / DR)
    *Security (built in policy definition/management)
    *Utilization/Optimization (scale-up and scale-down)

    I like the story I recently heard about utilization….the IT staff said that they want alerts which not only notify them when machines have become “over-utilized”…say 99% utilization, but also “under-utilized”…say 90%. In an elastic environment this would be represented in parameters for automatic scale-up and scale-down.

    ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>