Python Real-World Projects Craft Your Python Portfolio with Deployable Applications
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing Ltd
[2023]
|
Edición: | First edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009767135306719 |
Tabla de Contenidos:
- Intro
- Title Page
- Copyright and Credits
- Contributors
- Table of Contents
- Preface
- A note on skills required
- Chapter 1: Project Zero: A Template for Other Projects
- On quality
- More Reading on Quality
- Suggested project sprints
- Inception
- Elaboration, part 1: define done
- Elaboration, part 2: define components and tests
- Construction
- Transition
- List of deliverables
- Development tool installation
- Project 0 - Hello World with test cases
- Description
- Approach
- Deliverables
- The pyproject.toml project file
- The docs directory
- The tests/features/hello_world.feature file
- The tests/steps/hw_cli.py module
- The tests/environment.py file
- The tests/test_hw.py unit tests
- The src/tox.ini file
- The src/hello_world.py file
- Definition of done
- Summary
- Extras
- Static analysis - mypy, flake8
- CLI features
- Logging
- Cookiecutter
- Chapter 2: Overview of the Projects
- General data acquisition
- Acquisition via Extract
- Inspection
- Clean, validate, standardize, and persist
- Summarize and analyze
- Statistical modeling
- Data contracts
- Summary
- Chapter 3: Project 1.1: Data Acquisition Base Application
- Description
- User experience
- About the source data
- About the output data
- Architectural approach
- Class design
- Design principles
- Functional design
- Deliverables
- Acceptance tests
- Additional acceptance scenarios
- Unit tests
- Unit testing the model
- Unit testing the PairBuilder class hierarchy
- Unit testing the remaining components
- Summary
- Extras
- Logging enhancements
- Configuration extensions
- Data subsets
- Another example data source
- Chapter 4: Data Acquisition Features: Web APIs and Scraping
- Project 1.2: Acquire data from a web service
- Description
- The Kaggle API
- About the source data
- Approach.
- Making API requests
- Downloading a ZIP archive
- Getting the data set list
- Rate limiting
- The main() function
- Deliverables
- Unit tests for the RestAccess class
- Acceptance tests
- The feature file
- Injecting a mock for the requests package
- Creating a mock service
- Behave fixture
- Kaggle access module and refactored main application
- Project 1.3: Scrape data from a web page
- Description
- About the source data
- Approach
- Making an HTML request with urllib.request
- HTML scraping and Beautiful Soup
- Deliverables
- Unit test for the html_extract module
- Acceptance tests
- HTML extract module and refactored main application
- Summary
- Extras
- Locate more JSON-format data
- Other data sets to extract
- Handling schema variations
- CLI enhancements
- Logging
- Chapter 5: Data Acquisition Features: SQL Database
- Project 1.4: A local SQL database
- Description
- Database design
- Data loading
- Approach
- SQL Data Definitions
- SQL Data Manipulations
- SQL Execution
- Loading the SERIES table
- Loading the SERIES_VALUE table
- Deliverables
- Project 1.5: Acquire data from a SQL extract
- Description
- The Object-Relational Mapping (ORM) problem
- About the source data
- Approach
- Extract from a SQL DB
- SQL-related processing distinct from CSV processing
- Deliverables
- Mock database connection and cursor objects for testing
- Unit test for a new acquisition module
- Acceptance tests using a SQLite database
- The feature file
- The sqlite fixture
- The step definitions
- The Database extract module, and refactoring
- Summary
- Extras
- Consider using another database
- Consider using a NoSQL database
- Consider using SQLAlchemy to define an ORM layer
- Chapter 6: Project 2.1: Data Inspection Notebook
- Description
- About the source data
- Approach.
- Notebook test cases for the functions
- Common code in a separate module
- Deliverables
- Notebook .ipynb file
- Cells and functions to analyze data
- Cells with Markdown to explain things
- Cells with test cases
- Executing a notebook's test suite
- Summary
- Extras
- Use pandas to examine data
- Chapter 7: Data Inspection Features
- Project 2.2: Validating cardinal domains - measures, counts, and durations
- Description
- Approach
- Dealing with currency and related values
- Dealing with intervals or durations
- Extract notebook functions
- Deliverables
- Inspection module
- Unit test cases for the module
- Project 2.3: Validating text and codes - nominal data and ordinal numbers
- Description
- Dates and times
- Time values, local time, and UTC time
- Approach
- Nominal data
- Extend the data inspection module
- Deliverables
- Revised inspection module
- Unit test cases
- Project 2.4: Finding reference domains
- Description
- Approach
- Collect and compare keys
- Summarize keys counts
- Deliverables
- Revised inspection module
- Unit test cases
- Revised notebook to use the refactored inspection model
- Summary
- Extras
- Markdown cells with dates and data source information
- Presentation materials
- JupyterBook or Quarto for even more sophisticated output
- Chapter 8: Project 2.5: Schema and Metadata
- Description
- Approach
- Define Pydantic classes and emit the JSON Schema
- Define expected data domains in JSON Schema notation
- Use JSON Schema to validate intermediate files
- Deliverables
- Schema acceptance tests
- Extended acceptance testing
- Summary
- Extras
- Revise all previous chapter models to use Pydantic
- Use the ORM layer
- Chapter 9: Project 3.1: Data Cleaning Base Application
- Description
- User experience
- Source data
- Result data
- Conversions and processing.
- Error reports
- Approach
- Model module refactoring
- Pydantic V2 validation
- Validation function design
- Incremental design
- CLI application
- Redirecting stdout
- Deliverables
- Acceptance tests
- Unit tests for the model features
- Application to clean data and create an NDJSON interim file
- Summary
- Extras
- Create an output file with rejected samples
- Chapter 10: Data Cleaning Features
- Project 3.2: Validate and convert source fields
- Description
- Approach
- Deliverables
- Unit tests for validation functions
- Project 3.3: Validate text fields (and numeric coded fields)
- Description
- Approach
- Deliverables
- Unit tests for validation functions
- Project 3.4: Validate references among separate data sources
- Description
- Approach
- Deliverables
- Unit tests for data gathering and validation
- Project 3.5: Standardize data to common codes and ranges
- Description
- Approach
- Deliverables
- Unit tests for standardizing functions
- Acceptance test
- Project 3.6: Integration to create an acquisition pipeline
- Description
- Multiple extractions
- Approach
- Consider packages to help create a pipeline
- Deliverables
- Acceptance test
- Summary
- Extras
- Hypothesis testing
- Rejecting bad data via filtering (instead of logging)
- Disjoint subentities
- Create a fan-out cleaning pipeline
- Chapter 11: Project 3.7: Interim Data Persistence
- Description
- Overall approach
- Designing idempotent operations
- Deliverables
- Unit test
- Acceptance test
- Cleaned up re-runnable application design
- Summary
- Extras
- Using a SQL database
- Persistence with NoSQL databases
- Chapter 12: Project 3.8: Integrated Data Acquisition Web Service
- Description
- The data series resources
- Creating data for download
- Overall approach
- OpenAPI 3 specification.
- RESTful API to be queried from a notebook
- A POST request starts processing
- The GET request for processing status
- The GET request for the results
- Security considerations
- Deliverables
- Acceptance test cases
- RESTful API app
- Unit test cases
- Summary
- Extras
- Add filtering criteria to the POST request
- Split the OpenAPI specification into two parts to use REF for the output schema
- Use Celery instead of concurrent.futures
- Call external processing directly instead of running a subprocess
- Chapter 13: Project 4.1: Visual Analysis Techniques
- Description
- Overall approach
- General notebook organization
- Python modules for summarizing
- PyPlot graphics
- Data frequency histograms
- X-Y scatter plot
- Iteration and evolution
- Deliverables
- Unit test
- Acceptance test
- Summary
- Extras
- Use Seaborn for plotting
- Adjust color palettes to emphasize key points about the data
- Chapter 14: Project 4.2: Creating Reports
- Description
- Slide decks and presentations
- Reports
- Overall approach
- Preparing slides
- Preparing a report
- Creating technical diagrams
- Deliverables
- Summary
- Extras
- Written reports with UML diagrams
- Chapter 15: Project 5.1: Modeling Base Application
- Description
- Approach
- Designing a summary app
- Describing the distribution
- Use cleaned data model
- Rethink the data inspection functions
- Create new results model
- Deliverables
- Acceptance testing
- Unit testing
- Application secondary feature
- Summary
- Extras
- Measures of shape
- Creating PDF reports
- Serving the HTML report from the data API
- Chapter 16: Project 5.2: Simple Multivariate Statistics
- Description
- Correlation coefficient
- Linear regression
- Diagrams
- Approach
- Statistical computations
- Analysis diagrams
- Including diagrams in the final document.
- Deliverables.