Problem-solving in high performance computing a situational awareness approach with Linux

Problem-Solving in High Performance Computing: A Situational Awareness Approach with Linux focuses on understanding giant computing grids as cohesive systems. Unlike other titles on general problem-solving or system administration, this book offers a cohesive approach to complex, layered environmen...

Descripción completa

Detalles Bibliográficos
Otros Autores:	Ljubuncic, Igor, author (author)
Formato:	Libro electrónico
Idioma:	Inglés
Publicado:	Waltham, MA : Morgan Kaufmann [2015]
Edición:	1st edition
Colección:	Gale eBooks
Materias:	Linux. High performance computing.
Ver en Biblioteca Universitat Ramon Llull:	https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009629690206719

Tabla de Contenidos:

Identification of a problemIf a tree falls in a forest, and no one hears it fall; Step-by-step identification; Always use simple tools first; Too much knowledge leads to mistakes; Problem definition; Problem that happens now or that may be; Outage size and severity versus business imperative; Known versus unknown; Problem reproduction; Can you isolate the problem?; Sporadic problems need special treatment; Plan how to control the chaos; Letting go is the hardest thing; Cause and effect; Do not get hung up on symptoms; Chicken and egg: what came first?
Do not make environment changes until you understand the nature of the problemIf you make a change, make sure you know what the expected outcome is; Conclusions; References; Chapter 2 - The investigation begins; Isolating the problem; Move from production to test; Rerun the minimal set needed to get results; Ignore biased information; avoid assumptions; Comparison to a healthy system and known references; It is not a bug, it is a feature; Compare expected results to a healthy system; Performance and behavior references are a must; Linear versus nonlinear response to changes
One variable at a timeProblems with linear complexity; Nonlinear problems; Response may be delayed or masked; Y to X rather than X to Y; Component search; Conclusions; Chapter 3 - Basic investigation; Profile the system status; Environment monitors; Machine accessibility, responsiveness, and uptime; Local and remote login and management console; The monitor that cried wolf; Read the system messages and logs; Using ps and top; System logs; Process accounting; Examine pattern of command execution; Correlate to problem manifestation; Avoid quick conclusions; Statistics to your aid; Vmstat
IostatSystem activity report (SAR); Conclusions; References; Chapter 4 - A deeper look into the system; Working with /proc; Hierarchy; Per-process variables; Kernel data; Process space; Examine kernel tunables; Sys subsystem; Memory management; Filesystem management; Network management; SunRPC; Kernel; Sysctl; Conclusions; References; Chapter 5 - Getting geeky - tracing and debugging applications; Working with strace and ltrace; Strace; Options; What you need to know before using strace; Strace from the standpoint of a system administrator; Strace has friends; Basic usage; Test case 1
Test case 2

Problem-solving in high performance computing a situational awareness approach with Linux

Ejemplares similares