Python data analysis cookbook over 140 practical recipes to help you make sense of your data with ease and build production-ready data apps

Over 140 practical recipes to help you make sense of your data with ease and build production-ready data apps About This Book Analyze Big Data sets, create attractive visualizations, and manipulate and process various data types Packed with rich recipes to help you learn and explore amazing algorith...

Descripción completa

Detalles Bibliográficos
Otros Autores: Idris, Ivan, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham : Packt Publishing 2016.
Edición:1st edition
Colección:Quick answers to common problems.
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630360306719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Laying the Foundation for Reproducible Data Analysis
  • Introduction
  • Setting up Anaconda
  • Installing the Data Science Toolbox
  • Creating a virtual environment with virtualenv and virtualenvwrapper
  • Sandboxing Python applications with Docker images
  • Keeping track of package versions and history in IPython Notebook
  • Configuring IPython
  • Learning to log for robust error checking
  • Unit testing your code
  • Configuring pandas
  • Configuring matplotlib
  • Seeding random number generators and NumPy print options
  • Standardizing reports, code style, and data access
  • Chapter 2: Creating Attractive Data Visualizations
  • Introduction
  • Graphing Anscombe's quartet
  • Choosing seaborn color palettes
  • Choosing matplotlib color maps
  • Interacting with IPython Notebook widgets
  • Viewing a matrix of scatterplots
  • Visualizing with d3.js via mpld3
  • Creating heatmaps
  • Combining box plots and kernel density plots with violin plots
  • Visualizing network graphs with hive plots
  • Displaying geographical maps
  • Using ggplot2-like plots
  • Highlighting data points with influence plots
  • Chapter 3: Statistical Data Analysis and Probability
  • Introduction
  • Fitting data to the exponential distribution
  • Fitting aggregated data to the gamma distribution
  • Fitting aggregated counts to the Poisson distribution
  • Determining bias
  • Estimating kernel density
  • Determining confidence intervals for mean, variance, and standard deviation
  • Sampling with probability weights
  • Exploring extreme values
  • Correlating variables with Pearson's correlation
  • Correlating variables with the Spearman rank correlation
  • Correlating a binary and a continuous variable with the point biserial correlation.
  • Evaluating relations between variables with ANOVA
  • Chapter 4: Dealing with Data and Numerical Issues
  • Introduction
  • Clipping and filtering outliers
  • Winsorizing data
  • Measuring central tendency of noisy data
  • Normalizing with the Box-Cox transformation
  • Transforming data with the power ladder
  • Transforming data with logarithms
  • Rebinning data
  • Applying logit() to transform proportions
  • Fitting a robust linear model
  • Taking variance into account with weighted least squares
  • Using arbitrary precision for optimization
  • Using arbitrary precision for linear algebra
  • Chapter 5: Web Mining, Databases, and Big Data
  • Introduction
  • Simulating web browsing
  • Scraping the Web
  • Dealing with non-ASCII text and HTML entities
  • Implementing association tables
  • Setting up database migration scripts
  • Adding a table column to an existing table
  • Adding indices after table creation
  • Setting up a test web server
  • Implementing a star schema with fact and dimension tables
  • Using HDFS
  • Setting up Spark
  • Clustering data with Spark
  • Chapter 6: Signal Processing and Timeseries
  • Introduction
  • Spectral analysis with periodograms
  • Estimating power spectral density with the Welch method
  • Analyzing peaks
  • Measuring phase synchronization
  • Exponential smoothing
  • Evaluating smoothing
  • Using the Lomb-Scargle periodogram
  • Analyzing the frequency spectrum of audio
  • Analyzing signals with the discrete cosine transform
  • Block bootstrapping time series data
  • Moving block bootstrapping time series data
  • Applying the discrete wavelet transform
  • Chapter 7: Selecting Stocks with Financial Data Analysis
  • Introduction
  • Computing simple and log returns
  • Ranking stocks with the Sharpe ratio and liquidity
  • Ranking stocks with the Calmar and Sortino ratios
  • Analyzing returns statistics.
  • Correlating individual stocks with the broader market
  • Exploring risk and return
  • Examining the market with the non-parametric runs test
  • Testing for random walks
  • Determining market efficiency with autoregressive models
  • Creating tables for a stock prices database
  • Populating the stock prices database
  • Optimizing an equal weights two-asset portfolio
  • Chapter 8: Text Mining and Social Network Analysis
  • Introduction
  • Creating a categorized corpus
  • Tokenizing news articles in sentences and words
  • Stemming, lemmatizing, filtering, and TF-IDF scores
  • Recognizing named entities
  • Extracting topics with non-negative matrix factorization
  • Implementing a basic terms database
  • Computing social network density
  • Calculating social network closeness centrality
  • Determining the betweenness centrality
  • Estimating the average clustering coefficient
  • Calculating the assortativity coefficient of a graph
  • Getting the clique number of a graph
  • Creating a document graph with cosine similarity
  • Chapter 9: Ensemble Learning and Dimensionality Reduction
  • Introduction
  • Recursively eliminating features
  • Applying principal component analysis for dimension reduction
  • Applying linear discriminant analysis for dimension reduction
  • Stacking and majority voting for multiple models
  • Learning with random forests
  • Fitting noisy data with the RANSAC algorithm
  • Bagging to improve results
  • Boosting for better learning
  • Nesting cross-validation
  • Reusing models with joblib
  • Hierarchically clustering data
  • Taking a Theano tour
  • Chapter 10: Evaluating Classifiers, Regressors, and Clusters
  • Introduction
  • Getting classification straight with the confusion matrix
  • Computing precision, recall, and F1-score
  • Examining a receiver operating characteristic and the area under a curve.
  • Visualizing the goodness of fit
  • Computing MSE and median absolute error
  • Evaluating clusters with the mean silhouette coefficient
  • Comparing results with a dummy classifier
  • Determining MAPE and MPE
  • Comparing with a dummy regressor
  • Calculating the mean absolute error and the residual sum of squares
  • Examining the kappa of classification
  • Taking a look at the Matthews correlation coefficient
  • Chapter 11: Analyzing Images
  • Introduction
  • Setting up OpenCV
  • Applying Scale-Invariant Feature Transform (SIFT)
  • Detecting features with SURF
  • Quantizing colors
  • Denoising images
  • Extracting patches from an image
  • Detecting faces with Haar cascades
  • Searching for bright stars
  • Extracting metadata from images
  • Extracting texture features from images
  • Applying hierarchical clustering on images
  • Segmenting images with spectral clustering
  • Chapter 12: Parallelism and Performance
  • Introduction
  • Just-in-time compiling with Numba
  • Speeding up numerical expressions with Numexpr
  • Running multiple threads with the threading module
  • Launching multiple tasks with the concurrent.futures module
  • Accessing resources asynchronously with the asyncio module
  • Distributed processing with execnet
  • Profiling memory usage
  • Calculating the mean, variance, skewness, and kurtosis on the fly
  • Caching with a least recently used cache
  • Caching HTTP requests
  • Streaming counting with the Count-min sketch
  • Harnessing the power of the GPU with OpenCL
  • Appendix A: Glossary
  • Appendix B: Function Reference
  • IPython
  • Matplotlib
  • NumPy
  • pandas
  • Scikit-learn
  • SciPy
  • Seaborn
  • Statsmodels
  • Appendix C: Online Resources
  • IPython notebooks and open data
  • Mathematics and statistics
  • Appendix D: Tips and Tricks for Command-Line and Miscellaneous Tools
  • IPython notebooks
  • Command-line tools.
  • The alias command
  • Command-line history
  • Reproducible sessions
  • Docker tips
  • Index.