Feb 23, 2012 |
7,466 views |

Book Description
With this digital Early Release edition of Hadoop: The Definitive Guide, you get the entire book bundle in its earliest form – the author’s raw and unedited content – so you can take advantage of this content long before the book’s official release. You’ll also receive updates when significant changes are made, as well as the final ebook version.p>Ready to unleash the power of your massive dataset? With the latest edition of this comprehensive resource, you’ll learn how to use Apache Hadoop to build and maintain reliable, scalable, distributed systems. It’s ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.
This third edition covers recent changes to Hadoop, including new material on the new MapReduce API, as well as version 2 of the MapReduce runtime (YARN) and its more flexible execution model. You’ll also find illuminating case studies that demonstrate how Hadoop is used to solve specific problems.
- Store large datasets with the Hadoop Distributed File System (HDFS), then run distributed computations with MapReduce
- Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence
- Discover common pitfalls and advanced features for writing real-world MapReduce programs
- Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
- Use Pig, a high-level query language for large-scale data processing Download Now »
Dec 19, 2011 |
5,082 views |

Book Description
Clojure in Action is a hands-on tutorial for the working programmer who has written code in a language like Java or Ruby, but has no prior experience with Lisp. It teaches Clojure from the basics to advanced topics using practical, real-world application examples. Blow through the theory and dive into practical matters like unit-testing and environment set-up, all the way through building a scalable web-application using domain-specific languages, Hadoop, HBase, and RabbitMQ.
Clojure is a modern Lisp for the JVM, and it has the strengths you’d expect: first-class functions, macros, support for functional programming, and a Lisp-like, clean programming style.
Clojure in Action is a practical guide focused on applying Clojure to practical programming challenges. You’ll start with a language tutorial written for readers who already know OOP. Then, you’ll dive into the use cases where Clojure really shines: state management, safe concurrency and multicore programming, first-class code generation, and Java interop. In each chapter, you’ll first explore the unique characteristics of a problem area and then discover how to tackle them using Clojure. Along the way, you’ll explore practical matters like architecture, unit testing, and set-up as you build a scalable web application that includes custom DSLs, Hadoop, HBase, and RabbitMQ.
What’s Inside
- A fast-paced Clojure tutorial
- Creating web services with Clojure Download Now »
Nov 15, 2011 |
4,226 views |

Book Description
It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You’ll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don’t.
With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier.
- Snow: works well in a traditional cluster environment
- Multicore: popular for multiprocessor and multicore computers
- Parallel: part of the upcoming R 2.14.0 release
- R+Hadoop: provides low-level access to a popular form of cluster computing
- RHIPE: uses Hadoop’s power with R’s language and interactive shell
- Segue: lets you use Elastic MapReduce as a backend for lapply-style operations
Table of Contents
Chapter 1 Getting Started
Chapter 2 snow
Chapter 3 multicore
Chapter 4 parallel
Chapter 5 A Primer on MapReduce and Hadoop Download Now »
Oct 08, 2011 |
5,669 views |

Book Description
This guide is an ideal learning tool and reference for Apache Pig, the programming language that helps you describe and run large data projects on Hadoop. With Pig, you can analyze data without having to create a full-fledged application—making it easy for you to experiment with new data sets.
Programming Pig shows newcomers how to get started, and teaches intermediate users the benefits of using Pig Latin, the data flow language for building and maintaining pipelines for processing data. Advanced users learn how to build complex data processing pipelines with Pig’s macros and modularity features, and discover how to build systems for complex data processing needs by embedding Pig Latin into scripting languages.
- Learn the advantages and disadvantages of using Pig instead of MapReduce
- Understand how Pig fits in with other Hadoop components, such as HDFS, Hive, MapReduce, and HBase
- Follow examples that explain built-in Pig Latin functions, and data operators such as join and group
- Use grunt, the shell that Pig provides for exploring and working with HDFS
- Get performance tuning tips for running Pig Latin scripts on Hadoop clusters in less time
- Extend Pig with powerful user defined functions written in Java or Python
About the Author
Alan is an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Download Now »
Sep 09, 2011 |
8,356 views |

Book Description
If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how Apache HBase can fulfill your needs. As the open source implementation of Google’s BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. HBase: The Definitive Guide provides the details you require, whether you simply want to evaluate this high-performance, non-relational database, or put it into practice right away.
HBase’s adoption rate is beginning to climb, and several IT executives are asking pointed questions about this high-capacity database. This is the only book available to give you meaningful answers.
- Learn how to distribute large datasets across an inexpensive cluster of commodity servers
- Develop HBase clients in many programming languages, including Java, Python, and Ruby
- Get details on HBase’s primary storage system, HDFS—Hadoop’s distributed and replicated filesystem
- Learn how HBase’s native interface to Hadoop’s MapReduce framework enables easy development and execution of batch jobs that can scan entire tables
- Discover the integration between HBase and other facets of the Apache Hadoop project
About the Author
Lars George has been involved with HBase since 2007, and became a full HBase committer in 2009. He has spoken at various Hadoop User Group meetings, as well as large conferences such as FOSDEM in Brussels. He also started the Munich OpenHUG meetings. Download Now »
Jan 12, 2011 |
11,164 views |

Book Description
Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.
The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.
Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.
This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples. Download Now »