Network algorithmics an interdisciplinary approach to designing fast networked devices

"Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices, Second Edition takes an interdisciplinary approach to applying principles for efficient implementation of network devices, offering solutions to the problem of network implementation bottlenecks. In design...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Varghese, George, 1960- author (author), Xu, Jun, author
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Cambridge, Massachusetts : Morgan Kaufmann [2022]
Edición:	Second edition
Colección:	The Morgan Kaufmann Series in Networking
Materias:	Computer network protocols. Network performance (Telecommunication) Computer algorithms.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009703305606719

Tabla de Contenidos:

Front Cover
Network Algorithmics
Copyright
Contents
Preface to the second edition
Preface
Audience
What this book is about
Organization of the book
Features
Usage
Why this book was written
Acknowledgments
15 principles used to overcome network bottlenecks
Part 1 The rules of the game
1 Introducing network algorithmics
1.1 The problem: network bottlenecks
1.1.1 Endnode bottlenecks
1.1.2 Router bottlenecks
1.2 The techniques: network algorithmics
1.2.1 Warm-up example: scenting an evil packet
1.2.2 Strawman solution
1.2.3 Thinking algorithmically
1.2.4 Refining the algorithm: exploiting hardware
1.2.5 Cleaning up
1.2.6 Characteristics of network algorithmics
1.3 Exercise
2 Network implementation models
2.1 Protocols
2.1.1 Transport and routing protocols
2.1.2 Abstract protocol model
2.1.3 Performance environment and measures
2.2 Hardware
2.2.1 Combinatorial logic
2.2.2 Timing and power
2.2.3 Raising the abstraction level of hardware design
2.2.4 Memories
Registers
SRAM
Dynamic RAM
Page-mode DRAMs
Interleaved DRAMs
2.2.5 Memory subsystem design techniques
2.2.6 Component-level design
2.2.7 Final hardware lessons
2.3 Network device architectures
2.3.1 Endnode architecture
2.3.2 Router architecture
Lookup
Switching
Queuing
Header validation and checksums
Route processing
Protocol processing
Fragmentation, redirects, and ARPs
2.4 Operating systems
2.4.1 Uninterrupted computation via processes
2.4.2 Infinite memory via virtual memory
2.4.3 Simple I/O via system calls
2.5 Summary
2.6 Exercises
3 Fifteen implementation principles
3.1 Motivating the use of principles-updating ternary content-addressable memories
3.2 Algorithms versus algorithmics.
3.3 Fifteen implementation principles-categorization and description
3.3.1 Systems principles
P1: Avoid obvious waste in common situations
P2: Shift computation in time
P3: Relax system requirements
P4: Leverage off system components
P5: Add hardware to improve performance
3.3.2 Principles for modularity with efficiency
P6: Create efficient specialized routines by replacing inefficient general-purpose routines
P7: Avoid unnecessary generality
P8: Don't be tied to reference implementations
P9: Pass hints in module interfaces
P10: Pass hints in protocol headers
3.3.3 Principles for speeding up routines
P11: Optimize the expected case
P11a: Use caches
P12: Add or exploit state to gain speed
P12a: Compute incrementally
P13: Optimize degrees of freedom
P14: Use special techniques for finite universes such as integers
P15: Use algorithmic techniques to create efficient data structures
3.4 Design versus implementation principles
3.5 Caveats
3.5.1 Eight cautionary questions
Q1: Is it worth improving performance?
Q2: Is this really a bottleneck?
Q3: What impact does the change have on the rest of the system?
Q4: Does the initial analysis indicate significant improvement?
Q5: Is it worth adding custom hardware?
Q6: Can protocol changes be avoided?
Q7: Do prototypes confirm the initial promise?
Q8: Will performance gains be lost if the environment changes?
3.6 Summary
3.7 Exercises
4 Principles in action
4.1 Buffer validation of application device channels
4.2 Scheduler for asynchronous transfer mode flow control
4.3 Route computation using Dijkstra's algorithm
4.4 Ethernet monitor using bridge hardware
4.5 Demultiplexing in the x-kernel
4.6 Tries with node compression
4.7 Packet filtering in routers
4.8 Avoiding fragmentation of LSPs.
4.9 Policing traffic patterns
4.10 Identifying a resource hog
4.11 Getting rid of the TCP open connection list
4.12 Acknowledgment withholding
4.13 Incrementally reading a large database
4.14 Binary search of long identifiers
4.15 Video conferencing via asynchronous transfer mode
Part 2 Playing with endnodes
5 Copying data
5.1 Why data copies
5.2 Reducing copying via local restructuring
5.2.1 Exploiting adaptor memory
5.2.2 Using copy-on-write
Implementing copy-on-write
5.2.3 Fbufs: optimizing page remapping
Fbufs
5.2.4 Transparently emulating copy semantics
5.2.5 Are zero copies used today?
5.3 Avoiding copying using remote DMA
5.3.1 Avoiding copying in a cluster
5.3.2 Modern-day incarnations of RDMA
Fiber channel
Infiniband
iSCSI via iWarp and RoCE
5.4 Broadening to file systems
5.4.1 Shared memory
5.4.2 IO-lite: A unified view of buffering
5.4.3 Avoiding file system copies via I/O splicing
5.5 Broadening beyond copies
5.6 Broadening beyond data manipulations
5.6.1 Using caches effectively
Data Cache Optimization
Code arrangement
Locality-driven layer processing
Software engineering considerations
5.6.2 Direct memory access versus programmed I/O
5.7 Conclusions
5.8 Exercises
6 Transferring control
6.1 Why control overhead?
6.2 Avoiding scheduling overhead in networking code
6.2.1 Making user-level protocol implementations real
6.3 Avoiding context-switching overhead in applications
6.3.1 Process per client
6.3.2 Thread per client
6.3.3 Event-driven scheduler
6.3.4 Event-driven server with helper processes
6.3.5 Task-based structuring
6.4 Scalable I/O Notification
6.4.1 A server mystery
6.4.2 Problems with implementations of select()
Parameters
Usage in a Web server
Implementation.
6.4.3 Analysis of select()
Obvious waste in select() implementation
Strategies and principles to fix select()
6.4.4 Speeding up select() without changing the API
6.4.5 Speeding up select() by changing the API
6.5 Avoiding system calls or Kernel Bypass
6.5.1 The virtual interface architecture proposal
6.5.2 Data Plane Development Kit (DPDK)
6.5.3 Single Root I/O Virtualization (SR-IOV)
6.6 Radical Restructuring of Operating Systems
6.7 Reducing interrupts
6.7.1 Avoiding receiver livelock
6.8 Conclusions
6.9 Exercises
7 Maintaining timers
7.1 Why timers?
7.2 Model and performance measures
7.3 Simplest timer schemes
7.4 Timing wheels
7.5 Hashed wheels
7.6 Hierarchical wheels
7.7 BSD implementation
7.8 Google Carousel implementation
7.9 Obtaining finer granularity timers
7.10 Conclusions
7.11 Exercises
8 Demultiplexing
8.1 Opportunities and challenges of early demultiplexing
8.2 Goals
8.3 CMU/Stanford packet filter: pioneering packet filters
8.4 Berkeley packet filter: enabling high-performance monitoring
8.5 Pathfinder: factoring out common checks
8.6 Dynamic packet filter: compilers to the rescue
8.7 Conclusions
8.8 Exercises
9 Protocol processing
9.1 Buffer management
9.1.1 Buffer allocation
Segregated pool allocator
Linux allocator
Batch allocator
9.1.2 Sharing buffers
Buffer stealing
Dynamic thresholds
9.2 Cyclic redundancy checks and checksums
9.2.1 Cyclic redundancy checks
Naive implementation
Implementation using linear feedback shift registers
Faster implementations
9.2.2 Internet checksums
Header checksum
9.2.3 Finessing checksums
9.3 Generic protocol processing
9.3.1 UDP processing
9.4 Reassembly
9.4.1 Efficient reassembly
9.5 Conclusions
9.6 Exercises.
Part 3 Playing with routers
10 Exact-match lookups
10.1 Challenge 1: Ethernet under fire
10.2 Challenge 2: wire speed forwarding
10.3 Challenge 3: scaling lookups to higher speeds
10.3.1 Scaling via hashing
10.3.2 Using hardware parallelism
10.3.3 The d-left approach
10.4 Summary
10.5 Exercise
11 Prefix-match lookups
11.1 Introduction to prefix lookups
11.1.1 Prefix notation
11.1.2 Why variable-length prefixes?
11.1.3 Lookup model
11.2 Finessing lookups
11.2.1 Threaded indices and tag switching
11.2.2 Flow switching
11.2.3 Status of tag switching, flow switching, and multiprotocol label switching
11.3 Non-algorithmic techniques for prefix matching
11.3.1 Caching
11.3.2 Ternary content-addressable memories
11.4 Unibit tries
11.5 Multibit tries
11.5.1 Fixed-stride tries
11.5.2 Variable-stride tries
11.5.3 Incremental update
11.6 Level-compressed (LC) tries
11.7 Lulea-compressed tries
11.8 Tree bitmap
11.8.1 Tree bitmap ideas
11.8.2 Tree bitmap search algorithm
11.8.3 PopTrie: an alternate bitmap algorithm
11.9 Binary search on ranges
11.10 Binary search on ranges with Initial Lookup Table
11.11 Binary search on prefix lengths
11.12 Linear search on prefix lengths with hardware assist
11.12.1 Using Bloom Filters to compress prefix bitmaps
11.12.2 SAIL: Uncompressed Bitmaps up to a pivot level
11.13 Memory allocation in compressed schemes
11.13.1 Frame-based compaction
11.14 Fixed Function Lookup-chip models
11.15 Programmable Lookup Chips and P4
11.15.1 The P4 language
11.15.2 IP Lookups in the P4 Model
11.16 Conclusions
11.17 Exercises
12 Packet classification
12.1 Why packet classification?
12.2 Packet-classification problem
12.3 Requirements and metrics
12.4 Simple solutions
12.4.1 Linear search.
12.4.2 Tuple space search.

Network algorithmics an interdisciplinary approach to designing fast networked devices

Ejemplares similares