hack/reduce - Hadoop: Big Data Transformation and Analytics with Hadoop

When: Tue December 04, 2012 6:30 pm

Organization: hack/reduce

Location: hack/reduce, 275 3rd St., Cambridge, MA 02142

Website: http://bit.ly/YEgzCC

Excited about big data, and want to jump in and start coding? Tired of attending the “fluffy” big data talks, and want to learn how to build something concrete? It’s time to Code Big or Go Home!

In this session, we will teach you how to program and operate Hadoop, the poster child technology enabler of Big Data. Why should you care? Take a look at this exploding chart of Hadoop job trends. Even CEOs are starting to care about Hadoop. Oh by the way, it’s open source and free to use.


In this 90-minute session, we will cover the following ground:

  • Use HDFS, the Hadoop file system to read and write data
  • Write and execute Java-based Map Reduce (MR) jobs to analyze the data at hand
  • Program MR jobs with other languages (e.g. python, perl) via Hadoop Streaming
  • Basic monitoring and performance tuning for Hadoop
  • Declarative data processing on Hadoop via HIVE

By the end of this session, you will be able to set up, program, and run Hadoop jobs on your own computer or on the cloud.

Speaker: Mingsheng Hong

Mingsheng Hong is Chief Data Scientist at Hadapt, driving the product roadmap and incubating analytic use cases. Prior to this role, Mingsheng was Field CTO at Vertica, an HP company, and was instrumental in its product development and positioning. Mingsheng obtained his Computer Science Ph.D. degree at Cornell University, where he built Cayuga, the world’s first expressive and scalable CEP engine. Mingsheng also co-founded the Microsoft CEDR event processing project, which became the Microsoft StreamInsight technology shipped with SQL Server 2008 and 2012. In his spare time, Mingsheng has been actively contributing to community work that promotes technology and entrepreneurship, such as NECINA and hack / reduce.