Hands-On Web Scraping with Python Extract Quality Data from the Web Using Effective Python Techniques

Web scraping is a powerful tool for extracting data from the web, but it can be daunting for those without a technical background. Designed for novices, this book will help you grasp the fundamentals of web scraping and Python programming, even if you have no prior experience. Adopting a practical,...

Descripción completa

Detalles Bibliográficos
Otros Autores: Chapagain, Anish, author (author)
Formato: Libro electrónico
Idioma:Inglés
Publicado: Birmingham, England : Packt Publishing [2023]
Edición:Second edition
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009769021406719
Tabla de Contenidos:
  • Cover
  • Title page
  • Copyright and Credits
  • Contributors
  • Table of Contents
  • Preface
  • Part 1: Python and Web Scraping
  • Chapter 1: Web Scraping Fundamentals
  • Technical requirements
  • What is web scraping?
  • Understanding the latest web technologies
  • HTTP
  • HTML
  • XML
  • JavaScript
  • CSS
  • Data-finding techniques used in web pages
  • HTML source page
  • Developer tools
  • Summary
  • Further reading
  • Chapter 2: Python Programming for Data and Web
  • Technical requirements
  • Why Python (for web scraping)?
  • Accessing the WWW with Python
  • Setting things up
  • Creating a virtual environment
  • Installing libraries
  • Loading URLs
  • URL handling and operations
  • requests - Python library
  • Implementing HTTP methods
  • GET
  • POST
  • Summary
  • Further reading
  • Part 2: Beginning Web Scraping
  • Chapter 3: Searching and Processing Web Documents
  • Technical requirements
  • Introducing XPath and CSS selectors to process markup documents
  • The Document Object Model (DOM)
  • XPath
  • CSS selectors
  • Using web browser DevTools to access web content
  • HTML elements and DOM navigation
  • XPath and CSS selectors using DevTools
  • Scraping using lxml - a Python library
  • lxml by example
  • Web scraping using lxml
  • Parsing robots.txt and sitemap.xml
  • The robots.txt file
  • Sitemaps
  • Summary
  • Further reading
  • Chapter 4: Scraping Using PyQuery, a jQuery-Like Library for Python
  • Technical requirements
  • PyQuery overview
  • Introducing jQuery
  • Exploring PyQuery
  • Installing PyQuery
  • Loading a web URL
  • Element traversing, attributes, and pseudo-classes
  • Iterating using PyQuery
  • Web scraping using PyQuery
  • Example 1 - scraping book details
  • Example 2 - sitemap to CSV
  • Example 3 - scraping quotes with author details
  • Summary
  • Further reading.
  • Chapter 5: Scraping the Web with Scrapy and Beautiful Soup
  • Technical requirements
  • Web parsing using Python
  • Introducing Beautiful Soup
  • Installing Beautiful Soup
  • Exploring Beautiful Soup
  • Web scraping using Beautiful Soup
  • Web scraping using Scrapy
  • Setting up a project
  • Creating an item
  • Implementing the spider
  • Exporting data
  • Deploying a web crawler
  • Summary
  • Further reading
  • Part 3: Advanced Scraping Concepts
  • Chapter 6: Working with the Secure Web
  • Technical requirements
  • Exploring secure web content
  • Form processing
  • Cookies and sessions
  • User authentication
  • HTML &lt
  • form&gt
  • processing using Python
  • User authentication and cookies
  • Using proxies
  • Summary
  • Further reading
  • Chapter 7: Data Extraction Using Web APIs
  • Technical requirements
  • Introduction to web APIs
  • Types of API
  • Benefits of web APIs
  • Data formats and patterns in APIs
  • Example 1 - sunrise and sunset
  • Example 2 - GitHub emojis
  • Example 3 - Open Library
  • Web scraping using APIs
  • Example 1 - holidays from the US calendar
  • Example 2 - Open Library book details
  • Example 3 - US cities and time zones
  • Summary
  • Further reading
  • Chapter 8: Using Selenium to Scrape the Web
  • Technical requirements
  • Introduction to Selenium
  • Advantages and disadvantages of Selenium
  • Use cases of Selenium
  • Components of Selenium
  • Using Selenium WebDriver
  • Setting things up
  • Exploring Selenium
  • Scraping using Selenium
  • Example 1 - book information
  • Example 2 - forms and searching
  • Summary
  • Further reading
  • Chapter 9: Using Regular Expressions and PDFs
  • Technical requirements
  • Overview of regex
  • Regex with Python
  • re (search, match, and findall)
  • re.split
  • re.sub
  • re.compile
  • Regex flags
  • Using regex to extract data
  • Example 1 - Yamaha dealer information.
  • Example 2 - data from sitemap
  • Example 3 - Godfrey's dealer
  • Data extraction from a PDF
  • The PyPDF2 library
  • Extraction using PyPDF2
  • Summary
  • Further reading
  • Part 4: Advanced Data-Related Concepts
  • Chapter 10: Data Mining, Analysis, and Visualization
  • Technical requirements
  • Introduction to data mining
  • Predictive data mining
  • Descriptive data mining
  • Handling collected data
  • Basic file handling
  • JSON
  • CSV
  • SQLite
  • Data analysis and visualization
  • Exploratory Data Analysis using ydata_profiling
  • pandas and plotly
  • Summary
  • Further reading
  • Chapter 11: Machine Learning and Web Scraping
  • Technical requirements
  • Introduction to ML
  • ML and Python programming
  • Types of ML
  • ML using scikit-learn
  • Simple linear regression
  • Multiple linear regression
  • Sentiment analysis
  • Summary
  • Further reading
  • Part 5: Conclusion
  • Chapter 12: After Scraping - Next Steps and Data Analysis
  • Technical requirements
  • What happens after scraping?
  • Web requests
  • pycurl
  • Proxies
  • Data processing
  • PySpark
  • polars
  • Jobs and careers
  • Summary
  • Further reading
  • Index
  • Other Books You May Enjoy.