Web scraping with Python data extraction from the modern web

If programming is magic, then web scraping is surely a form of wizardry. By writing a simple automated program, you can query web servers, request data, and parse it to extract the information you need. This thoroughly updated third edition not only introduces you to web scraping but also serves as...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Mitchell, Ryan author (author)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Sebastopol : O'Reilly Media, Inc 2024.
Edición:	3rd edition
Materias:	Data mining. Python (Computer program language)
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009801534306719

Tabla de Contenidos:

Cover
Copyright
Table of Contents
Preface
What Is Web Scraping?
Why Web Scraping?
About This Book
Conventions Used in This Book
Using Code Examples
O'Reilly Online Learning
How to Contact Us
Acknowledgments
Part I. Building Scrapers
Chapter 1. How the Internet Works
Networking
Physical Layer
Data Link Layer
Network Layer
Transport Layer
Session Layer
Presentation Layer
Application Layer
HTML
CSS
JavaScript
Watching Websites with Developer Tools
Chapter 2. The Legalities and Ethics of Web Scraping
Trademarks, Copyrights, Patents, Oh My!
Copyright Law
Trespass to Chattels
The Computer Fraud and Abuse Act
robots.txt and Terms of Service
Three Web Scrapers
eBay v. Bidder's Edge and Trespass to Chattels
United States v. Auernheimer and the Computer Fraud and Abuse Act
Field v. Google: Copyright and robots.txt
Chapter 3. Applications of Web Scraping
Classifying Projects
E-commerce
Marketing
Academic Research
Product Building
Travel
Sales
SERP Scraping
Chapter 4. Writing Your First Web Scraper
Installing and Using Jupyter
Connecting
An Introduction to BeautifulSoup
Installing BeautifulSoup
Running BeautifulSoup
Connecting Reliably and Handling Exceptions
Chapter 5. Advanced HTML Parsing
Another Serving of BeautifulSoup
find() and find_all() with BeautifulSoup
Other BeautifulSoup Objects
Navigating Trees
Regular Expressions
Regular Expressions and BeautifulSoup
Accessing Attributes
Lambda Expressions
You Don't Always Need a Hammer
Chapter 6. Writing Web Crawlers
Traversing a Single Domain
Crawling an Entire Site
Collecting Data Across an Entire Site
Crawling Across the Internet
Chapter 7. Web Crawling Models
Planning and Defining Objects
Dealing with Different Website Layouts
Structuring Crawlers
Crawling Sites Through Search
Crawling Sites Through Links
Crawling Multiple Page Types
Thinking About Web Crawler Models
Chapter 8. Scrapy
Installing Scrapy
Initializing a New Spider
Writing a Simple Scraper
Spidering with Rules
Creating Items
Outputting Items
The Item Pipeline
Logging with Scrapy
More Resources
Chapter 9. Storing Data
Media Files
Storing Data to CSV
MySQL
Installing MySQL
Some Basic Commands
Integrating with Python
Database Techniques and Good Practice
"Six Degrees" in MySQL
Email
Part II. Advanced Scraping
Chapter 10. Reading Documents
Document Encoding
Text
Text Encoding and the Global Internet
CSV
Reading CSV Files
PDF
Microsoft Word and .docx
Chapter 11. Working with Dirty Data
Cleaning Text
Working with Normalized Text
Cleaning Data with Pandas
Cleaning
Indexing, Sorting, and Filtering
More About Pandas
Chapter 12. Reading and Writing Natural Languages
Summarizing Data

Web scraping with Python data extraction from the modern web

Ejemplares similares