Web scraping with Python data extraction from the modern web

If programming is magic, then web scraping is surely a form of wizardry. By writing a simple automated program, you can query web servers, request data, and parse it to extract the information you need. This thoroughly updated third edition not only introduces you to web scraping but also serves as...

Full description

Bibliographic Details
Other Authors: Mitchell, Ryan author (author)
Format: eBook
Language:Inglés
Published: Sebastopol : O'Reilly Media, Inc 2024.
Edition:3rd edition
Subjects:
See on Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009801534306719
Table of Contents:
  • Cover
  • Copyright
  • Table of Contents
  • Preface
  • What Is Web Scraping?
  • Why Web Scraping?
  • About This Book
  • Conventions Used in This Book
  • Using Code Examples
  • O'Reilly Online Learning
  • How to Contact Us
  • Acknowledgments
  • Part I. Building Scrapers
  • Chapter 1. How the Internet Works
  • Networking
  • Physical Layer
  • Data Link Layer
  • Network Layer
  • Transport Layer
  • Session Layer
  • Presentation Layer
  • Application Layer
  • HTML
  • CSS
  • JavaScript
  • Watching Websites with Developer Tools
  • Chapter 2. The Legalities and Ethics of Web Scraping
  • Trademarks, Copyrights, Patents, Oh My!
  • Copyright Law
  • Trespass to Chattels
  • The Computer Fraud and Abuse Act
  • robots.txt and Terms of Service
  • Three Web Scrapers
  • eBay v. Bidder's Edge and Trespass to Chattels
  • United States v. Auernheimer and the Computer Fraud and Abuse Act
  • Field v. Google: Copyright and robots.txt
  • Chapter 3. Applications of Web Scraping
  • Classifying Projects
  • E-commerce
  • Marketing
  • Academic Research
  • Product Building
  • Travel
  • Sales
  • SERP Scraping
  • Chapter 4. Writing Your First Web Scraper
  • Installing and Using Jupyter
  • Connecting
  • An Introduction to BeautifulSoup
  • Installing BeautifulSoup
  • Running BeautifulSoup
  • Connecting Reliably and Handling Exceptions
  • Chapter 5. Advanced HTML Parsing
  • Another Serving of BeautifulSoup
  • find() and find_all() with BeautifulSoup
  • Other BeautifulSoup Objects
  • Navigating Trees
  • Regular Expressions
  • Regular Expressions and BeautifulSoup
  • Accessing Attributes
  • Lambda Expressions
  • You Don't Always Need a Hammer
  • Chapter 6. Writing Web Crawlers
  • Traversing a Single Domain
  • Crawling an Entire Site
  • Collecting Data Across an Entire Site
  • Crawling Across the Internet
  • Chapter 7. Web Crawling Models
  • Planning and Defining Objects
  • Dealing with Different Website Layouts
  • Structuring Crawlers
  • Crawling Sites Through Search
  • Crawling Sites Through Links
  • Crawling Multiple Page Types
  • Thinking About Web Crawler Models
  • Chapter 8. Scrapy
  • Installing Scrapy
  • Initializing a New Spider
  • Writing a Simple Scraper
  • Spidering with Rules
  • Creating Items
  • Outputting Items
  • The Item Pipeline
  • Logging with Scrapy
  • More Resources
  • Chapter 9. Storing Data
  • Media Files
  • Storing Data to CSV
  • MySQL
  • Installing MySQL
  • Some Basic Commands
  • Integrating with Python
  • Database Techniques and Good Practice
  • "Six Degrees" in MySQL
  • Email
  • Part II. Advanced Scraping
  • Chapter 10. Reading Documents
  • Document Encoding
  • Text
  • Text Encoding and the Global Internet
  • CSV
  • Reading CSV Files
  • PDF
  • Microsoft Word and .docx
  • Chapter 11. Working with Dirty Data
  • Cleaning Text
  • Working with Normalized Text
  • Cleaning Data with Pandas
  • Cleaning
  • Indexing, Sorting, and Filtering
  • More About Pandas
  • Chapter 12. Reading and Writing Natural Languages
  • Summarizing Data