Scikit-learn cookbook over 80 recipes for machine learning python with scikit-learn
Learn to use scikit-learn operations and functions for Machine Learning and deep learning applications. About This Book Handle a variety of machine learning tasks effortlessly by leveraging the power of scikit-learn Perform supervised and unsupervised learning with ease, and evaluate the performance...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England ; Mumbai, [India] :
Packt
2017.
|
Edición: | 2nd edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009630383306719 |
Tabla de Contenidos:
- Cover
- Copyright
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: High-Performance Machine Learning - NumPy
- Introduction
- NumPy basics
- How to do it...
- The shape and dimension of NumPy arrays
- NumPy broadcasting
- Initializing NumPy arrays and dtypes
- Indexing
- Boolean arrays
- Arithmetic operations
- NaN values
- How it works...
- Loading the iris dataset
- Getting ready
- How to do it...
- How it works...
- Viewing the iris dataset
- How to do it...
- How it works...
- There's more...
- Viewing the iris dataset with Pandas
- How to do it...
- How it works...
- Plotting with NumPy and matplotlib
- Getting ready
- How to do it...
- A minimal machine learning recipe - SVM classification
- Getting ready
- How to do it...
- How it works...
- There's more...
- Introducing cross-validation
- Getting ready
- How to do it...
- How it works...
- There's more...
- Putting it all together
- How to do it...
- There's more...
- Machine learning overview - classification versus regression
- The purpose of scikit-learn
- Supervised versus unsupervised
- Getting ready
- How to do it...
- Quick SVC - a classifier and regressor
- Making a scorer
- How it works...
- There's more...
- Linear versus nonlinear
- Black box versus not
- Interpretability
- A pipeline
- Chapter 2: Pre-Model Workflow and Pre-Processing
- Introduction
- Creating sample data for toy analysis
- Getting ready
- How to do it...
- Creating a regression dataset
- Creating an unbalanced classification dataset
- Creating a dataset for clustering
- How it works...
- Scaling data to the standard normal distribution
- Getting ready
- How to do it...
- How it works...
- Creating binary features through thresholding
- Getting ready.
- How to do it...
- There's more...
- Sparse matrices
- The fit method
- Working with categorical variables
- Getting ready
- How to do it...
- How it works...
- There's more...
- DictVectorizer class
- Imputing missing values through various strategies
- Getting ready
- How to do it...
- How it works...
- There's more...
- A linear model in the presence of outliers
- Getting ready
- How to do it...
- How it works...
- Putting it all together with pipelines
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using Gaussian processes for regression
- Getting ready
- How to do it…
- Cross-validation with the noise parameter
- There's more...
- Using SGD for regression
- Getting ready
- How to do it…
- How it works…
- Chapter 3: Dimensionality Reduction
- Introduction
- Reducing dimensionality with PCA
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using factor analysis for decomposition
- Getting ready
- How to do it...
- How it works...
- Using kernel PCA for nonlinear dimensionality reduction
- Getting ready
- How to do it...
- How it works...
- Using truncated SVD to reduce dimensionality
- Getting ready
- How to do it...
- How it works...
- There's more...
- Sign flipping
- Sparse matrices
- Using decomposition to classify with DictionaryLearning
- Getting ready
- How to do it...
- How it works...
- Doing dimensionality reduction with manifolds - t-SNE
- Getting ready
- How to do it...
- How it works...
- Testing methods to reduce dimensionality with pipelines
- Getting ready
- How to do it...
- How it works...
- Chapter 4: Linear Models with scikit-learn
- Introduction
- Fitting a line through data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Fitting a line through data with machine learning.
- Getting ready
- How to do it...
- Evaluating the linear regression model
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using ridge regression to overcome linear regression's shortfalls
- Getting ready
- How to do it...
- Optimizing the ridge regression parameter
- Getting ready
- How to do it...
- How it works...
- There's more...
- Bayesian ridge regression
- Using sparsity to regularize models
- Getting ready
- How to do it...
- How it works...
- LASSO cross-validation - LASSOCV
- LASSO for feature selection
- Taking a more fundamental approach to regularization with LARS
- Getting ready
- How to do it...
- How it works...
- There's more...
- References
- Chapter 5: Linear Models - Logistic Regression
- Introduction
- Using linear methods for classification - logistic regression
- Loading data from the UCI repository
- How to do it...
- Viewing the Pima Indians diabetes dataset with pandas
- How to do it...
- Looking at the UCI Pima Indians dataset web page
- How to do it...
- View the citation policy
- Read about missing values and context
- Machine learning with logistic regression
- Getting ready
- Define X, y - the feature and target arrays
- How to do it...
- Provide training and testing sets
- Train the logistic regression
- Score the logistic regression
- Examining logistic regression errors with a confusion matrix
- Getting ready
- How to do it...
- Reading the confusion matrix
- General confusion matrix in context
- Varying the classification threshold in logistic regression
- Getting ready
- How to do it...
- Receiver operating characteristic - ROC analysis
- Getting ready
- Sensitivity
- A visual perspective
- How to do it...
- Calculating TPR in scikit-learn
- Plotting sensitivity
- There's more...
- The confusion matrix in a non-medical context.
- Plotting an ROC curve without context
- How to do it...
- Perfect classifier
- Imperfect classifier
- AUC - the area under the ROC curve
- Putting it all together - UCI breast cancer dataset
- How to do it...
- Outline for future projects
- Chapter 6: Building Models with Distance Metrics
- Introduction
- Using k-means to cluster data
- Getting ready
- How to do it…
- How it works...
- Optimizing the number of centroids
- Getting ready
- How to do it...
- How it works...
- Assessing cluster correctness
- Getting ready
- How to do it...
- There's more...
- Using MiniBatch k-means to handle more data
- Getting ready
- How to do it...
- How it works...
- Quantizing an image with k-means clustering
- Getting ready
- How do it…
- How it works…
- Finding the closest object in the feature space
- Getting ready
- How to do it...
- How it works...
- There's more...
- Probabilistic clustering with Gaussian mixture models
- Getting ready
- How to do it...
- How it works...
- Using k-means for outlier detection
- Getting ready
- How to do it...
- How it works...
- Using KNN for regression
- Getting ready
- How to do it…
- How it works..
- Chapter 7: Cross-Validation and Post-Model Workflow
- Introduction
- Selecting a model with cross-validation
- Getting ready
- How to do it...
- How it works...
- K-fold cross validation
- Getting ready
- How to do it..
- There's more...
- Balanced cross-validation
- Getting ready
- How to do it...
- There's more...
- Cross-validation with ShuffleSplit
- Getting ready
- How to do it...
- Time series cross-validation
- Getting ready
- How to do it...
- There's more...
- Grid search with scikit-learn
- Getting ready
- How to do it...
- How it works...
- Randomized search with scikit-learn
- Getting ready
- How to do it...
- Classification metrics.
- Getting ready
- How to do it...
- There's more...
- Regression metrics
- Getting ready
- How to do it...
- Clustering metrics
- Getting ready
- How to do it...
- Using dummy estimators to compare results
- Getting ready
- How to do it...
- How it works...
- Feature selection
- Getting ready
- How to do it...
- How it works...
- Feature selection on L1 norms
- Getting ready
- How to do it...
- There's more...
- Persisting models with joblib or pickle
- Getting ready
- How to do it...
- Opening the saved model
- There's more...
- Chapter 8: Support Vector Machines
- Introduction
- Classifying data with a linear SVM
- Getting ready
- Load the data
- Visualize the two classes
- How to do it...
- How it works...
- There's more...
- Optimizing an SVM
- Getting ready
- How to do it...
- Construct a pipeline
- Construct a parameter grid for a pipeline
- Provide a cross-validation scheme
- Perform a grid search
- There's more...
- Randomized grid search alternative
- Visualize the nonlinear RBF decision boundary
- More meaning behind C and gamma
- Multiclass classification with SVM
- Getting ready
- How to do it...
- OneVsRestClassifier
- Visualize it
- How it works...
- Support vector regression
- Getting ready
- How to do it...
- Chapter 9: Tree Algorithms and Ensembles
- Introduction
- Doing basic classifications with decision trees
- Getting ready
- How to do it...
- Visualizing a decision tree with pydot
- How to do it...
- How it works...
- There's more...
- Tuning a decision tree
- Getting ready
- How to do it...
- There's more...
- Using decision trees for regression
- Getting ready
- How to do it...
- There's more...
- Reducing overfitting with cross-validation
- How to do it...
- There's more...
- Implementing random forest regression
- Getting ready
- How to do it.
- Bagging regression with nearest neighbors.