ForewordPreface1. Introduction Security Overview Confidentiality Integrity Availability Authentication, Authorization, and Accounting Hadoop Security: A Brief History Hadoop Components and Ecosystem Apache HDFS Apache YARN Apache MapReduce Apache Hive Cloudera Impala Apache Sentry (Incubating) Apache HBase Apache Accumulo Apache Solr Apache Oozie Apache ZooKeeper Apache Flume Apache Sqoop Cloudera Hue SummaryPart I. Security Architecture2. Securing Distributed Systems Threat Categories Unauthorized Access/Masquerade Insider Threat Denial of Service Threats to Data Threat and Risk Assessment User Assessment Environment Assessment Vulnerabilities Defense in Depth Summary3. System Architecture Operating Environment Network Security Network Segmentation Network Firewalls Intrusion Detection and Prevention Hadoop Roles and Separation Strategies Master Nodes Worker Nodes Management Nodes Edge Nodes Operating System Security Remote Access Controls Host Firewalls SELinux Summary4. Kerberos Why Kerberos Kerberos Overview Kerberos Workflow: A Simple Example Kerberos Trusts MIT Kerberos Server Configuration Client Configuration SummaryPart II. Authentication, Authorization, and Accounting5. Identity and Authentication Identity Mapping Kerberos Principals to Usernames Hadoop User to Group Mapping Provisioning of Hadoop Users Authentication Kerberos Username and Password Authentication Tokens Impersonation Configuration Summary6. Authorization HDFS Authorization HDFS Extended ACLs Service-Level Authorization MapReduce and YARN Authorization MapReduce (MR1) YARN (MR2) ZooKeeper ACLs Oozie Authorization HBase and Accumulo Authorization System, Namespace, and Table-Level Authorization Column- and Cell-Level Authorization Summary7. Apache Sentry (Incubating) Sentry Concepts The Sentry Service Sentry Service Configuration Hive Authorization Hive Sentry Configuration Impala Authorization Impala Sentry Configuration Solr Authorization Solr Sentry Configuration Sentry Privilege Models SQL Privilege Model Solr Privilege Model Sentry Policy Administration SQL Commands SQL Policy File Solr Policy File Policy File Verification and Validation Migrating From Policy Files Summary8. Accounting HDFS Audit Logs MapReduce Audit Logs YARN Audit Logs Hive Audit Logs Cloudera Impala Audit Logs HBase Audit Logs Accumulo Audit Logs Sentry Audit Logs Log Aggregation SummaryPart III. Data Security9. Data Protection Encryption Algorithms Encrypting Data at Rest Encryption and Key Management HDFS Data-at-Rest Encryption MapReduce2 Intermediate Data Encryption Impala Disk Spill Encryption Full Disk Encryption Filesystem Encryption Important Data Security Consideration for Hadoop Encrypting Data in Transit Transport Layer Security Hadoop Data-in-Transit Encryption Data Destruction and Deletion Summary10. Securing Data Ingest Integrity of Ingested Data Data Ingest Confidentiality Flume Encryption Sqoop Encryption Ingest Workflows Enterprise Architecture Summary11. Data Extraction and Client Access Security. Hadoop Command-Line Interface Securing Applications HBase HBase Shell HBase REST Gateway HBase Thrift Gateway Accumulo Accumulo Shell Accumulo Proxy Server Oozie Sqoop SQL Access Impala Hive WebHDFS/HttpFS Summary12. Cloudera Hue Hue HTTPS Hue Authentication SPNEGO Backend SAML Backend LDAP Backend Hue Authorization Hue SSL Client Configurations SummaryPart IV. Putting It All Together13. Case Studies Case Study: Hadoop Data Warehouse Environment Setup User Experience Summary Case Study: Interactive HBase Web Application Design and Architecture Security Requirements Cluster Configuration Implementation Notes SummaryAfterwordIndex