rss Home » Tag Archive For ‘Hadoop’

Hadoop: The Definitive Guide, 3rd Edition (Early Release)

Hadoop: The Definitive Guide, 3rd Edition (Early Release)

Book Description

With this digital Early Release edition of : , you get the entire book bundle in its earliest form – the author’s raw and unedited content – so you can take advantage of this content long before the book’s official release. You’ll also receive updates when significant changes are made, as well as the final ebook version.p>Ready to unleash the power of your massive dataset? With the latest edition of this comprehensive resource, you’ll learn how to use Apache to build and maintain reliable, scalable, distributed systems. It’s ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run clusters.

This third edition covers recent changes to Hadoop, including new material on the new API, as well as version 2 of the runtime (YARN) and its more flexible execution model. You’ll also find illuminating case studies that demonstrate how Hadoop is used to solve specific problems.

  • Store large datasets with the Hadoop Distributed File System (HDFS), then run distributed computations with
  • Use Hadoop’s data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop , or run Hadoop in the
  • Use Pig, a high-level query language for large-scale data processing Download Now »

Clojure in Action

Clojure in Action

Book Description

is a hands-on tutorial for the working programmer who has written code in a language like or Ruby, but has no prior experience with Lisp. It teaches from the basics to advanced topics using practical, real-world application examples. Blow through the theory and dive into practical matters like unit-testing and environment set-up, all the way through building a scalable web-application using domain-specific languages, , , and .

Clojure is a modern Lisp for the , and it has the strengths you’d expect: first-class functions, macros, support for functional , and a Lisp-like, clean style.

Clojure is a practical guide focused on applying Clojure to practical challenges. You’ll start with a language tutorial written for readers who already know . Then, you’ll dive into the use cases where Clojure really shines: state management, safe concurrency and multicore programming, first-class code generation, and interop. In each chapter, you’ll first explore the unique characteristics of a problem area and then discover how to tackle them using Clojure. Along the way, you’ll explore practical matters like architecture, unit testing, and set-up as you build a scalable web application that includes custom , , , and .

What’s Inside

  • A fast-paced Clojure tutorial
  • Creating web services with Clojure Download Now »

Parallel R

Parallel R

Book Description

It’s tough to argue with as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using to analyze large datasets. You’ll learn the basics of Snow, Multicore, , and some -related tools, including how to find them, how to use them, when they work well, and when they don’t.

With these packages, you can overcome ’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address ’s memory barrier.

  • Snow: works well in a traditional environment
  • Multicore: popular for multiprocessor and multicore computers
  • : part of the upcoming R 2.14.0 release
  • R+: provides low-level access to a popular form of computing
  • RHIPE: uses Hadoop’s power with R’s language and interactive shell
  • Segue: lets you use Elastic as a backend for lapply-style operations

Table of Contents
Chapter 1 Getting Started
Chapter 2 snow
Chapter 3 multicore
Chapter 4 parallel
Chapter 5 A Primer on and Hadoop Download Now »

Programming Pig

Programming Pig

Book Description

This guide is an ideal learning tool and reference for , the language that helps you describe and run large data projects on . With Pig, you can analyze data without having to create a full-fledged application—making it easy for you to experiment with new data sets.

Pig shows newcomers how to get started, and teaches intermediate users the benefits of using Pig Latin, the data flow language for building and maintaining pipelines for processing data. Advanced users learn how to build complex data processing pipelines with Pig’s macros and modularity features, and discover how to build systems for complex data processing needs by embedding Pig Latin into scripting languages.

  • Learn the advantages and disadvantages of using Pig instead of
  • Understand how Pig fits in with other components, such as HDFS, Hive, , and
  • Follow examples that explain built-in Pig Latin functions, and data operators such as join and group
  • Use grunt, the shell that Pig provides for exploring and working with HDFS
  • Get performance tuning tips for running Pig Latin scripts on Hadoop clusters in less time
  • Extend Pig with powerful user defined functions written in or Python

About the Author
Alan is an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Download Now »

HBase: The Definitive Guide

HBase: The Definitive Guide

Book Description

If your organization is looking for a storage solution to accommodate a virtually endless amount of data, this book will show you how can fulfill your needs. As the open source implementation of Google’s BigTable architecture, scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. : provides the details you require, whether you simply want to evaluate this high-performance, non-relational , or put it into practice right away.

HBase’s adoption rate is beginning to climb, and several IT executives are asking pointed questions about this high-capacity . This is the only book available to give you meaningful answers.

  • Learn how to distribute large datasets across an inexpensive of commodity servers
  • Develop HBase clients in many languages, including , Python, and Ruby
  • Get details on HBase’s primary storage system, HDFS—’s distributed and replicated filesystem
  • Learn how HBase’s native interface to ’s framework enables easy and execution of batch jobs that can scan entire tables
  • Discover the integration between HBase and other facets of the Apache Hadoop project

About the Author
Lars George has been involved with HBase since 2007, and became a full HBase committer in 2009. He has spoken at various Hadoop User Group meetings, as well as large conferences such as FOSDEM in Brussels. He also started the Munich OpenHUG meetings. Download Now »

Hadoop in Action

Hadoop in Action

Book Description

teaches readers how to use and write programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop will lead the reader from obtaining a copy of Hadoop to setting it up in a and writing data analytic programs.

The book begins by making the basic idea of Hadoop and easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.

Hadoop in Action will explain how to use Hadoop and present design patterns and practices of MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.

This book assumes the reader will have a basic familiarity with , as most code examples will be written in . Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples. Download Now »

12»
Copyright © 2012 Wow! eBook · All rights reserved · Powered by WordPress