Hands-On Web Scraping with Python Extract Quality Data from the Web Using Effective Python Techniques
Web scraping is a powerful tool for extracting data from the web, but it can be daunting for those without a technical background. Designed for novices, this book will help you grasp the fundamentals of web scraping and Python programming, even if you have no prior experience. Adopting a practical,...
Otros Autores: | |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Birmingham, England :
Packt Publishing
[2023]
|
Edición: | Second edition |
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009769021406719 |
Tabla de Contenidos:
- Cover
- Title page
- Copyright and Credits
- Contributors
- Table of Contents
- Preface
- Part 1: Python and Web Scraping
- Chapter 1: Web Scraping Fundamentals
- Technical requirements
- What is web scraping?
- Understanding the latest web technologies
- HTTP
- HTML
- XML
- JavaScript
- CSS
- Data-finding techniques used in web pages
- HTML source page
- Developer tools
- Summary
- Further reading
- Chapter 2: Python Programming for Data and Web
- Technical requirements
- Why Python (for web scraping)?
- Accessing the WWW with Python
- Setting things up
- Creating a virtual environment
- Installing libraries
- Loading URLs
- URL handling and operations
- requests - Python library
- Implementing HTTP methods
- GET
- POST
- Summary
- Further reading
- Part 2: Beginning Web Scraping
- Chapter 3: Searching and Processing Web Documents
- Technical requirements
- Introducing XPath and CSS selectors to process markup documents
- The Document Object Model (DOM)
- XPath
- CSS selectors
- Using web browser DevTools to access web content
- HTML elements and DOM navigation
- XPath and CSS selectors using DevTools
- Scraping using lxml - a Python library
- lxml by example
- Web scraping using lxml
- Parsing robots.txt and sitemap.xml
- The robots.txt file
- Sitemaps
- Summary
- Further reading
- Chapter 4: Scraping Using PyQuery, a jQuery-Like Library for Python
- Technical requirements
- PyQuery overview
- Introducing jQuery
- Exploring PyQuery
- Installing PyQuery
- Loading a web URL
- Element traversing, attributes, and pseudo-classes
- Iterating using PyQuery
- Web scraping using PyQuery
- Example 1 - scraping book details
- Example 2 - sitemap to CSV
- Example 3 - scraping quotes with author details
- Summary
- Further reading.
- Chapter 5: Scraping the Web with Scrapy and Beautiful Soup
- Technical requirements
- Web parsing using Python
- Introducing Beautiful Soup
- Installing Beautiful Soup
- Exploring Beautiful Soup
- Web scraping using Beautiful Soup
- Web scraping using Scrapy
- Setting up a project
- Creating an item
- Implementing the spider
- Exporting data
- Deploying a web crawler
- Summary
- Further reading
- Part 3: Advanced Scraping Concepts
- Chapter 6: Working with the Secure Web
- Technical requirements
- Exploring secure web content
- Form processing
- Cookies and sessions
- User authentication
- HTML <
- form>
- processing using Python
- User authentication and cookies
- Using proxies
- Summary
- Further reading
- Chapter 7: Data Extraction Using Web APIs
- Technical requirements
- Introduction to web APIs
- Types of API
- Benefits of web APIs
- Data formats and patterns in APIs
- Example 1 - sunrise and sunset
- Example 2 - GitHub emojis
- Example 3 - Open Library
- Web scraping using APIs
- Example 1 - holidays from the US calendar
- Example 2 - Open Library book details
- Example 3 - US cities and time zones
- Summary
- Further reading
- Chapter 8: Using Selenium to Scrape the Web
- Technical requirements
- Introduction to Selenium
- Advantages and disadvantages of Selenium
- Use cases of Selenium
- Components of Selenium
- Using Selenium WebDriver
- Setting things up
- Exploring Selenium
- Scraping using Selenium
- Example 1 - book information
- Example 2 - forms and searching
- Summary
- Further reading
- Chapter 9: Using Regular Expressions and PDFs
- Technical requirements
- Overview of regex
- Regex with Python
- re (search, match, and findall)
- re.split
- re.sub
- re.compile
- Regex flags
- Using regex to extract data
- Example 1 - Yamaha dealer information.
- Example 2 - data from sitemap
- Example 3 - Godfrey's dealer
- Data extraction from a PDF
- The PyPDF2 library
- Extraction using PyPDF2
- Summary
- Further reading
- Part 4: Advanced Data-Related Concepts
- Chapter 10: Data Mining, Analysis, and Visualization
- Technical requirements
- Introduction to data mining
- Predictive data mining
- Descriptive data mining
- Handling collected data
- Basic file handling
- JSON
- CSV
- SQLite
- Data analysis and visualization
- Exploratory Data Analysis using ydata_profiling
- pandas and plotly
- Summary
- Further reading
- Chapter 11: Machine Learning and Web Scraping
- Technical requirements
- Introduction to ML
- ML and Python programming
- Types of ML
- ML using scikit-learn
- Simple linear regression
- Multiple linear regression
- Sentiment analysis
- Summary
- Further reading
- Part 5: Conclusion
- Chapter 12: After Scraping - Next Steps and Data Analysis
- Technical requirements
- What happens after scraping?
- Web requests
- pycurl
- Proxies
- Data processing
- PySpark
- polars
- Jobs and careers
- Summary
- Further reading
- Index
- Other Books You May Enjoy.