153 – High-Performance Computing
Rate/Vote |
This episode is a conversation with Iain Bethune from the Edinburgh Parallel Computing Center about high-performance computing: the topic has played an implicit role in many previous omega tau episodes, and this episode treats it explicitly. We discuss different architectures (supercomputers, commodity clusters, grid computing), programming languages and software design as well as application areas.
Notes and Corrections by Iain
At 37:10 I dropped a factor of 1000 when I talked about the linpack
performance of Tianhe-2. Should have been petaflops instead of teraflops.
Coincidentally, just after we recorded I had a twitter discussion about the
use of different programming languages for HPC applications, and collected
some actual evidence to back up the estimates I made in the podcast :
http://www.prace-ri.eu/IMG/pdf/D7-4-1.pdf (figure 15). We also collected
some data from the first 6 months of the ARCHER service which suggests we
have an even more Fortran-dominated world than I had suggested:
In terms of node hours, Fortran is 66%, C++ is 6%, C is 5% and 23% are unknown.
Books suggested by Iain
- Book: Introduction to High Performance Computing for Scientists and Engineers
- Online Book: High Performance Computing
Links
- Mr Iain Bethune
- EPCC at The University of Edinburgh
- Supercomputer (WP)
- Cloud Computing (WP)
- Central processing unit (WP)
- Mooreu0026#39;s law (WP)
- Cray Inc.: the Supercomputer Company
- Cray X-MP (WP)
- Silicon Graphics
- Vector processor (WP)
- Red Storm (computing) (WP)
- Cray XT3 (WP)
- Massively parallel processor array (WP)
- Intel Xeon Processor
- AMD Opteron
- SIMD (WP)
- Commodity computing (WP)
- Gigabit Ethernet (WP)
- InfiniBand (WP)
- Performance
- Scalability (WP)
- High-throughput computing (WP)
- AWS | Amazon Elastic Compute Cloud (EC2)
- Grid computing (WP)
- Welcome to the Worldwide LHC Computing Grid
- SETI@home
- Embarrassingly parallel (WP)
- Volunteer computing (WP)
- BOINC
- FLOPS (WP)
- About / The Linpack Benchmark
- Gaussian elimination (WP)
- TOP500 Supercomputer Sites
- HPCC Benchmark
- HPCG Benchmark
- Conjugate gradient method (WP)
- The Green500 List
- Tianhe-2 (WP)
- Ivy Bridge (microarchitecture) (WP)
- IBM Watson
- POWER7 (WP)
- ARCHER national supercomputing service
- Cray XC30 Supercomputer – Cray Inc.
- Graphics processing unit (WP)
- Synchronization Overhead
- C (programming language) (WP)
- Fortran (WP)
- C++ (programming language) (WP)
- Python Programming Language
- Ruby Programming Language
- Erlang Programming Language
- Functional programming (WP)
- Computational science (WP)
- Sparse matrix (WP)
- Structured Grid
- Distributed memory (WP)
- Message passing (WP)
- Message Passing Interface (WP)
- Application programming interface (WP)
- CORBA (WP)
- Abstract Syntax Notation One (WP)
- Deadlock (WP)
- Futures
- Callback (computer programming) (WP)
- Remote procedure call (WP)
- Patterns for Parallel Programming (Book)
- OpenMP
- POSIX Threads Programming
- Partitioned global address space (WP)
- MapReduce (WP)
- Task Farming | ICHEC
- Software design pattern (WP)
- Exascale computing (WP)
- Database Software | Oracle
- Unit testing (WP)
- Integration testing (WP)
- Computational fluid dynamics (WP)
- Amdahlu0026#39;s law (WP)
- Gustafsonu0026#39;s law (WP)
- CUDA | NVIDIA
- OpenACC Home | openacc.org
- Domain-specific language (WP)
- Markus Püschel – ETH Zürich
- Truncation error (WP)
- Discretization error (WP)
- Sensitivity analysis (WP)
- Computational chemistry (WP)
- Materials science (WP)
- Nuclear Test Ban Treaty
- FPGA Field Programmable Gate Array
Pingback: Die letzten und nächsten 24h, Sonntag, 10.08.2014 | die Hörsuppe
I enjoyed the HPC episode…
The discussion on availability reminded me of the situation during the 1950s, where a computer might have 20,000 valves, each with a 20,000 hour MTBF (Mean Time Between Failures): you expect a failure about every hour.
One non-stop computer used for ICBM early warning had two identical computers; one was active, and the other ran diagnostics while having the hardware stressed and repaired. They swapped every 15 minutes…
Now, the replaceable units are a CPU card with multiple cores. In principle, the HPC could continue working while one CPU card is replaced, but in practice a soft fault could “pollute” the answers, so checkpointing, rollback and restart is “safer”, without having to duplicate the entire HPC.
Thanks for the addition, Evan!
I clicked 2 stars but meant to click 5 stars. Outstanding interview!
Thanks Mike :-)
This is one of my favourite Omega Tau episodes of them all. I listened to it when it came out but of all the episodes it really sticks out in my mind. I am now studying a high performance computing module as part of my masters degree and this episode gave me lots of useful background. I can second Iain’s book recommendations – they are the two recommended text books for the module I’m taking and they are both excellent (especially the free online one). If anyone is feeling super-keen the module website is at the following link where you can find lecture notes, programming exercises, useful links, etc. http://www-users.york.ac.uk/~mijp1/teaching/4th_year_HPC/
Great! It’s nice to hear that an episode is actually useful, and not just generally interesting :-)