Scikit-learn cookbook over 80 recipes for machine learning python with scikit-learn

Learn to use scikit-learn operations and functions for Machine Learning and deep learning applications. About This Book Handle a variety of machine learning tasks effortlessly by leveraging the power of scikit-learn Perform supervised and unsupervised learning with ease, and evaluate the performance...

Descripción completa

Detalles Bibliográficos
Otros Autores: Avila, Julian, author (author), Hauck, Trent, author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England ; Mumbai, [India] : Packt 2017.
Edición:2nd edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630383306719
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Authors
  • About the Reviewer
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: High-Performance Machine Learning - NumPy
  • Introduction
  • NumPy basics
  • How to do it...
  • The shape and dimension of NumPy arrays
  • NumPy broadcasting
  • Initializing NumPy arrays and dtypes
  • Indexing
  • Boolean arrays
  • Arithmetic operations
  • NaN values
  • How it works...
  • Loading the iris dataset
  • Getting ready
  • How to do it...
  • How it works...
  • Viewing the iris dataset
  • How to do it...
  • How it works...
  • There's more...
  • Viewing the iris dataset with Pandas
  • How to do it...
  • How it works...
  • Plotting with NumPy and matplotlib
  • Getting ready
  • How to do it...
  • A minimal machine learning recipe - SVM classification
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Introducing cross-validation
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Putting it all together
  • How to do it...
  • There's more...
  • Machine learning overview - classification versus regression
  • The purpose of scikit-learn
  • Supervised versus unsupervised
  • Getting ready
  • How to do it...
  • Quick SVC - a classifier and regressor
  • Making a scorer
  • How it works...
  • There's more...
  • Linear versus nonlinear
  • Black box versus not
  • Interpretability
  • A pipeline
  • Chapter 2: Pre-Model Workflow and Pre-Processing
  • Introduction
  • Creating sample data for toy analysis
  • Getting ready
  • How to do it...
  • Creating a regression dataset
  • Creating an unbalanced classification dataset
  • Creating a dataset for clustering
  • How it works...
  • Scaling data to the standard normal distribution
  • Getting ready
  • How to do it...
  • How it works...
  • Creating binary features through thresholding
  • Getting ready.
  • How to do it...
  • There's more...
  • Sparse matrices
  • The fit method
  • Working with categorical variables
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • DictVectorizer class
  • Imputing missing values through various strategies
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • A linear model in the presence of outliers
  • Getting ready
  • How to do it...
  • How it works...
  • Putting it all together with pipelines
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Using Gaussian processes for regression
  • Getting ready
  • How to do it…
  • Cross-validation with the noise parameter
  • There's more...
  • Using SGD for regression
  • Getting ready
  • How to do it…
  • How it works…
  • Chapter 3: Dimensionality Reduction
  • Introduction
  • Reducing dimensionality with PCA
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Using factor analysis for decomposition
  • Getting ready
  • How to do it...
  • How it works...
  • Using kernel PCA for nonlinear dimensionality reduction
  • Getting ready
  • How to do it...
  • How it works...
  • Using truncated SVD to reduce dimensionality
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Sign flipping
  • Sparse matrices
  • Using decomposition to classify with DictionaryLearning
  • Getting ready
  • How to do it...
  • How it works...
  • Doing dimensionality reduction with manifolds - t-SNE
  • Getting ready
  • How to do it...
  • How it works...
  • Testing methods to reduce dimensionality with pipelines
  • Getting ready
  • How to do it...
  • How it works...
  • Chapter 4: Linear Models with scikit-learn
  • Introduction
  • Fitting a line through data
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Fitting a line through data with machine learning.
  • Getting ready
  • How to do it...
  • Evaluating the linear regression model
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Using ridge regression to overcome linear regression's shortfalls
  • Getting ready
  • How to do it...
  • Optimizing the ridge regression parameter
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Bayesian ridge regression
  • Using sparsity to regularize models
  • Getting ready
  • How to do it...
  • How it works...
  • LASSO cross-validation - LASSOCV
  • LASSO for feature selection
  • Taking a more fundamental approach to regularization with LARS
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • References
  • Chapter 5: Linear Models - Logistic Regression
  • Introduction
  • Using linear methods for classification - logistic regression
  • Loading data from the UCI repository
  • How to do it...
  • Viewing the Pima Indians diabetes dataset with pandas
  • How to do it...
  • Looking at the UCI Pima Indians dataset web page
  • How to do it...
  • View the citation policy
  • Read about missing values and context
  • Machine learning with logistic regression
  • Getting ready
  • Define X, y - the feature and target arrays
  • How to do it...
  • Provide training and testing sets
  • Train the logistic regression
  • Score the logistic regression
  • Examining logistic regression errors with a confusion matrix
  • Getting ready
  • How to do it...
  • Reading the confusion matrix
  • General confusion matrix in context
  • Varying the classification threshold in logistic regression
  • Getting ready
  • How to do it...
  • Receiver operating characteristic - ROC analysis
  • Getting ready
  • Sensitivity
  • A visual perspective
  • How to do it...
  • Calculating TPR in scikit-learn
  • Plotting sensitivity
  • There's more...
  • The confusion matrix in a non-medical context.
  • Plotting an ROC curve without context
  • How to do it...
  • Perfect classifier
  • Imperfect classifier
  • AUC - the area under the ROC curve
  • Putting it all together - UCI breast cancer dataset
  • How to do it...
  • Outline for future projects
  • Chapter 6: Building Models with Distance Metrics
  • Introduction
  • Using k-means to cluster data
  • Getting ready
  • How to do it…
  • How it works...
  • Optimizing the number of centroids
  • Getting ready
  • How to do it...
  • How it works...
  • Assessing cluster correctness
  • Getting ready
  • How to do it...
  • There's more...
  • Using MiniBatch k-means to handle more data
  • Getting ready
  • How to do it...
  • How it works...
  • Quantizing an image with k-means clustering
  • Getting ready
  • How do it…
  • How it works…
  • Finding the closest object in the feature space
  • Getting ready
  • How to do it...
  • How it works...
  • There's more...
  • Probabilistic clustering with Gaussian mixture models
  • Getting ready
  • How to do it...
  • How it works...
  • Using k-means for outlier detection
  • Getting ready
  • How to do it...
  • How it works...
  • Using KNN for regression
  • Getting ready
  • How to do it…
  • How it works..
  • Chapter 7: Cross-Validation and Post-Model Workflow
  • Introduction
  • Selecting a model with cross-validation
  • Getting ready
  • How to do it...
  • How it works...
  • K-fold cross validation
  • Getting ready
  • How to do it..
  • There's more...
  • Balanced cross-validation
  • Getting ready
  • How to do it...
  • There's more...
  • Cross-validation with ShuffleSplit
  • Getting ready
  • How to do it...
  • Time series cross-validation
  • Getting ready
  • How to do it...
  • There's more...
  • Grid search with scikit-learn
  • Getting ready
  • How to do it...
  • How it works...
  • Randomized search with scikit-learn
  • Getting ready
  • How to do it...
  • Classification metrics.
  • Getting ready
  • How to do it...
  • There's more...
  • Regression metrics
  • Getting ready
  • How to do it...
  • Clustering metrics
  • Getting ready
  • How to do it...
  • Using dummy estimators to compare results
  • Getting ready
  • How to do it...
  • How it works...
  • Feature selection
  • Getting ready
  • How to do it...
  • How it works...
  • Feature selection on L1 norms
  • Getting ready
  • How to do it...
  • There's more...
  • Persisting models with joblib or pickle
  • Getting ready
  • How to do it...
  • Opening the saved model
  • There's more...
  • Chapter 8: Support Vector Machines
  • Introduction
  • Classifying data with a linear SVM
  • Getting ready
  • Load the data
  • Visualize the two classes
  • How to do it...
  • How it works...
  • There's more...
  • Optimizing an SVM
  • Getting ready
  • How to do it...
  • Construct a pipeline
  • Construct a parameter grid for a pipeline
  • Provide a cross-validation scheme
  • Perform a grid search
  • There's more...
  • Randomized grid search alternative
  • Visualize the nonlinear RBF decision boundary
  • More meaning behind C and gamma
  • Multiclass classification with SVM
  • Getting ready
  • How to do it...
  • OneVsRestClassifier
  • Visualize it
  • How it works...
  • Support vector regression
  • Getting ready
  • How to do it...
  • Chapter 9: Tree Algorithms and Ensembles
  • Introduction
  • Doing basic classifications with decision trees
  • Getting ready
  • How to do it...
  • Visualizing a decision tree with pydot
  • How to do it...
  • How it works...
  • There's more...
  • Tuning a decision tree
  • Getting ready
  • How to do it...
  • There's more...
  • Using decision trees for regression
  • Getting ready
  • How to do it...
  • There's more...
  • Reducing overfitting with cross-validation
  • How to do it...
  • There's more...
  • Implementing random forest regression
  • Getting ready
  • How to do it.
  • Bagging regression with nearest neighbors.