Practical data wrangling expert techniques for transforming your raw data into a valuable source for analytics
Turn your noisy data into relevant, insight-ready information by leveraging the data wrangling techniques in Python and R About This Book This easy-to-follow guide takes you through every step of the data wrangling process in the best possible way Work with different types of datasets, and reshape t...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham ; Mumbai :
Packt
2017.
|
Edición: | 1st edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630435106719 |
Tabla de Contenidos:
- Cover
- Title Page
- Copyright
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: Programming with Data
- Understanding data wrangling
- Getting and reading data
- Cleaning data
- Shaping and structuring data
- Storing data
- The tools for data wrangling
- Python
- R
- Summary
- Chapter 2: Introduction to Programming in Python
- External resources
- Logistical overview
- Installation requirements
- Using other learning resources
- Python 2 versus Python 3
- Running programs in python
- Using text editors to write and manage programs
- Writing the hello world program
- Using the terminal to run programs
- Running the Hello World program
- What if it didn't work?
- Data types, variables, and the Python shell
- Numbers - integers and floats
- Why integers?
- Strings
- Booleans
- The print function
- Variables
- Adding to a variable
- Subtracting from a variable
- Multiplication
- Division
- Naming variables
- Arrays (lists, if you ask Python)
- Dictionaries
- Compound statements
- Compound statement syntax and indentation level
- For statements and iterables
- If statements
- Else and elif clauses
- Functions
- Passing arguments to a function
- Returning values from a function
- Making annotations within programs
- A programmer's resources
- Documentation
- Online forums and mailing lists
- Summary
- Chapter 3: Reading, Exploring, and Modifying Data - Part I
- External resources
- Logistical overview
- Installation requirements
- Data
- File system setup
- Introducing a basic data wrangling work flow
- Introducing the JSON file format
- Opening and closing a file in Python using file I/O
- The open function and file objects
- File structure - best practices to store your data
- Opening a file.
- Reading the contents of a file
- Modules in Python
- Parsing a JSON file using the json module
- Exploring the contents of a data file
- Extracting the core content of the data
- Listing out all of the variables in the data
- Modifying a dataset
- Extracting data variables from the original dataset
- Using a for loop to iterate over the data
- Using a nested for loop to iterate over the data variables
- Outputting the modified data to a new file
- Specifying input and output file names in the Terminal
- Specifying the filenames from the Terminal
- Summary
- Chapter 4: Reading, Exploring, and Modifying Data - Part II
- Logistical overview
- File system setup
- Data
- Installing pandas
- Understanding the CSV format
- Introducing the CSV module
- Using the CSV module to read CSV data
- Using the CSV module to write CSV data
- Using the pandas module to read and process data
- Counting the total road length in 2011 revisited
- Handling non-standard CSV encoding and dialect
- Understanding XML
- XML versus JSON
- Using the XML module to parse XML data
- XPath
- Summary
- Chapter 5: Manipulating Text Data - An Introduction to Regular Expressions
- Logistical overview
- Data
- File structure setup
- Understanding the need for pattern recognition
- Introducting regular expressions
- Writing and using a regular expression
- Special characters
- Matching whitespace
- Matching the start of string
- Matching the end of a string
- Matching a range of characters
- Matching any one of several patterns
- Matching a sequence instead of just one character
- Putting patterns together
- Extracting a pattern from a string
- The regex split() function
- Python regex documentation
- Looking for patterns
- Quantifying the existence of patterns
- Creating a regular expression to match the street address.
- Counting the number of matches
- Verifying the correctness of the matches
- Extracting patterns
- Outputting the data to a new file
- Summary
- Chapter 6: Cleaning Numerical Data - An Introduction to R and RStudio
- Logistical overview
- Data
- Directory structure
- Installing R and RStudio
- Introducing R and RStudio
- Familiarizing yourself with RStudio
- Running R commands
- Setting the working directory
- Reading data
- The R dataframe
- R vectors
- Indexing R dataframes
- Finding the 2011 total in R
- Conducting basic outlier detection and removal
- Handling NA values
- Deleting missing values
- Replacing missing values with a constant
- Imputation of missing values
- Variable names and contents
- Summary
- Chapter 7: Simplifying Data Manipulation with dplyr
- Logistical overview
- Data
- File system setup
- Installing the dplyr and tibble packages
- Introducing dplyr
- Getting started with dplyr
- Chaining operations together
- Filtering the rows of a dataframe
- Summarizing data by category
- Rewriting code using dplyr
- Summary
- Chapter 8: Getting Data from the Web
- Logistical overview
- Filesystem setup
- Installing the requests module
- Internet connection
- Introducing APIs
- Using Python to retrieve data from APIs
- Using URL parameters to filter the results
- Summary
- Chapter 9: Working with Large Datasets
- Logistical overview
- System requirements
- Data
- File system setup
- Installing MongoDB
- Planning out your time
- Cleaning up
- Understanding computer memory
- Understanding databases
- Introducing MongoDB
- Interfacing with MongoDB from Python
- Summary
- Index.