Python Real-World Projects Craft Your Python Portfolio with Deployable Applications

Detalles Bibliográficos
Otros Autores: Lott, Steven F., author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt Publishing Ltd [2023]
Edición:First edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009767135306719
Tabla de Contenidos:
  • Intro
  • Title Page
  • Copyright and Credits
  • Contributors
  • Table of Contents
  • Preface
  • A note on skills required
  • Chapter 1: Project Zero: A Template for Other Projects
  • On quality
  • More Reading on Quality
  • Suggested project sprints
  • Inception
  • Elaboration, part 1: define done
  • Elaboration, part 2: define components and tests
  • Construction
  • Transition
  • List of deliverables
  • Development tool installation
  • Project 0 - Hello World with test cases
  • Description
  • Approach
  • Deliverables
  • The pyproject.toml project file
  • The docs directory
  • The tests/features/hello_world.feature file
  • The tests/steps/hw_cli.py module
  • The tests/environment.py file
  • The tests/test_hw.py unit tests
  • The src/tox.ini file
  • The src/hello_world.py file
  • Definition of done
  • Summary
  • Extras
  • Static analysis - mypy, flake8
  • CLI features
  • Logging
  • Cookiecutter
  • Chapter 2: Overview of the Projects
  • General data acquisition
  • Acquisition via Extract
  • Inspection
  • Clean, validate, standardize, and persist
  • Summarize and analyze
  • Statistical modeling
  • Data contracts
  • Summary
  • Chapter 3: Project 1.1: Data Acquisition Base Application
  • Description
  • User experience
  • About the source data
  • About the output data
  • Architectural approach
  • Class design
  • Design principles
  • Functional design
  • Deliverables
  • Acceptance tests
  • Additional acceptance scenarios
  • Unit tests
  • Unit testing the model
  • Unit testing the PairBuilder class hierarchy
  • Unit testing the remaining components
  • Summary
  • Extras
  • Logging enhancements
  • Configuration extensions
  • Data subsets
  • Another example data source
  • Chapter 4: Data Acquisition Features: Web APIs and Scraping
  • Project 1.2: Acquire data from a web service
  • Description
  • The Kaggle API
  • About the source data
  • Approach.
  • Making API requests
  • Downloading a ZIP archive
  • Getting the data set list
  • Rate limiting
  • The main() function
  • Deliverables
  • Unit tests for the RestAccess class
  • Acceptance tests
  • The feature file
  • Injecting a mock for the requests package
  • Creating a mock service
  • Behave fixture
  • Kaggle access module and refactored main application
  • Project 1.3: Scrape data from a web page
  • Description
  • About the source data
  • Approach
  • Making an HTML request with urllib.request
  • HTML scraping and Beautiful Soup
  • Deliverables
  • Unit test for the html_extract module
  • Acceptance tests
  • HTML extract module and refactored main application
  • Summary
  • Extras
  • Locate more JSON-format data
  • Other data sets to extract
  • Handling schema variations
  • CLI enhancements
  • Logging
  • Chapter 5: Data Acquisition Features: SQL Database
  • Project 1.4: A local SQL database
  • Description
  • Database design
  • Data loading
  • Approach
  • SQL Data Definitions
  • SQL Data Manipulations
  • SQL Execution
  • Loading the SERIES table
  • Loading the SERIES_VALUE table
  • Deliverables
  • Project 1.5: Acquire data from a SQL extract
  • Description
  • The Object-Relational Mapping (ORM) problem
  • About the source data
  • Approach
  • Extract from a SQL DB
  • SQL-related processing distinct from CSV processing
  • Deliverables
  • Mock database connection and cursor objects for testing
  • Unit test for a new acquisition module
  • Acceptance tests using a SQLite database
  • The feature file
  • The sqlite fixture
  • The step definitions
  • The Database extract module, and refactoring
  • Summary
  • Extras
  • Consider using another database
  • Consider using a NoSQL database
  • Consider using SQLAlchemy to define an ORM layer
  • Chapter 6: Project 2.1: Data Inspection Notebook
  • Description
  • About the source data
  • Approach.
  • Notebook test cases for the functions
  • Common code in a separate module
  • Deliverables
  • Notebook .ipynb file
  • Cells and functions to analyze data
  • Cells with Markdown to explain things
  • Cells with test cases
  • Executing a notebook's test suite
  • Summary
  • Extras
  • Use pandas to examine data
  • Chapter 7: Data Inspection Features
  • Project 2.2: Validating cardinal domains - measures, counts, and durations
  • Description
  • Approach
  • Dealing with currency and related values
  • Dealing with intervals or durations
  • Extract notebook functions
  • Deliverables
  • Inspection module
  • Unit test cases for the module
  • Project 2.3: Validating text and codes - nominal data and ordinal numbers
  • Description
  • Dates and times
  • Time values, local time, and UTC time
  • Approach
  • Nominal data
  • Extend the data inspection module
  • Deliverables
  • Revised inspection module
  • Unit test cases
  • Project 2.4: Finding reference domains
  • Description
  • Approach
  • Collect and compare keys
  • Summarize keys counts
  • Deliverables
  • Revised inspection module
  • Unit test cases
  • Revised notebook to use the refactored inspection model
  • Summary
  • Extras
  • Markdown cells with dates and data source information
  • Presentation materials
  • JupyterBook or Quarto for even more sophisticated output
  • Chapter 8: Project 2.5: Schema and Metadata
  • Description
  • Approach
  • Define Pydantic classes and emit the JSON Schema
  • Define expected data domains in JSON Schema notation
  • Use JSON Schema to validate intermediate files
  • Deliverables
  • Schema acceptance tests
  • Extended acceptance testing
  • Summary
  • Extras
  • Revise all previous chapter models to use Pydantic
  • Use the ORM layer
  • Chapter 9: Project 3.1: Data Cleaning Base Application
  • Description
  • User experience
  • Source data
  • Result data
  • Conversions and processing.
  • Error reports
  • Approach
  • Model module refactoring
  • Pydantic V2 validation
  • Validation function design
  • Incremental design
  • CLI application
  • Redirecting stdout
  • Deliverables
  • Acceptance tests
  • Unit tests for the model features
  • Application to clean data and create an NDJSON interim file
  • Summary
  • Extras
  • Create an output file with rejected samples
  • Chapter 10: Data Cleaning Features
  • Project 3.2: Validate and convert source fields
  • Description
  • Approach
  • Deliverables
  • Unit tests for validation functions
  • Project 3.3: Validate text fields (and numeric coded fields)
  • Description
  • Approach
  • Deliverables
  • Unit tests for validation functions
  • Project 3.4: Validate references among separate data sources
  • Description
  • Approach
  • Deliverables
  • Unit tests for data gathering and validation
  • Project 3.5: Standardize data to common codes and ranges
  • Description
  • Approach
  • Deliverables
  • Unit tests for standardizing functions
  • Acceptance test
  • Project 3.6: Integration to create an acquisition pipeline
  • Description
  • Multiple extractions
  • Approach
  • Consider packages to help create a pipeline
  • Deliverables
  • Acceptance test
  • Summary
  • Extras
  • Hypothesis testing
  • Rejecting bad data via filtering (instead of logging)
  • Disjoint subentities
  • Create a fan-out cleaning pipeline
  • Chapter 11: Project 3.7: Interim Data Persistence
  • Description
  • Overall approach
  • Designing idempotent operations
  • Deliverables
  • Unit test
  • Acceptance test
  • Cleaned up re-runnable application design
  • Summary
  • Extras
  • Using a SQL database
  • Persistence with NoSQL databases
  • Chapter 12: Project 3.8: Integrated Data Acquisition Web Service
  • Description
  • The data series resources
  • Creating data for download
  • Overall approach
  • OpenAPI 3 specification.
  • RESTful API to be queried from a notebook
  • A POST request starts processing
  • The GET request for processing status
  • The GET request for the results
  • Security considerations
  • Deliverables
  • Acceptance test cases
  • RESTful API app
  • Unit test cases
  • Summary
  • Extras
  • Add filtering criteria to the POST request
  • Split the OpenAPI specification into two parts to use REF for the output schema
  • Use Celery instead of concurrent.futures
  • Call external processing directly instead of running a subprocess
  • Chapter 13: Project 4.1: Visual Analysis Techniques
  • Description
  • Overall approach
  • General notebook organization
  • Python modules for summarizing
  • PyPlot graphics
  • Data frequency histograms
  • X-Y scatter plot
  • Iteration and evolution
  • Deliverables
  • Unit test
  • Acceptance test
  • Summary
  • Extras
  • Use Seaborn for plotting
  • Adjust color palettes to emphasize key points about the data
  • Chapter 14: Project 4.2: Creating Reports
  • Description
  • Slide decks and presentations
  • Reports
  • Overall approach
  • Preparing slides
  • Preparing a report
  • Creating technical diagrams
  • Deliverables
  • Summary
  • Extras
  • Written reports with UML diagrams
  • Chapter 15: Project 5.1: Modeling Base Application
  • Description
  • Approach
  • Designing a summary app
  • Describing the distribution
  • Use cleaned data model
  • Rethink the data inspection functions
  • Create new results model
  • Deliverables
  • Acceptance testing
  • Unit testing
  • Application secondary feature
  • Summary
  • Extras
  • Measures of shape
  • Creating PDF reports
  • Serving the HTML report from the data API
  • Chapter 16: Project 5.2: Simple Multivariate Statistics
  • Description
  • Correlation coefficient
  • Linear regression
  • Diagrams
  • Approach
  • Statistical computations
  • Analysis diagrams
  • Including diagrams in the final document.
  • Deliverables.