Diary/2019-4-15
ASPLOS三日目
本会議の一日目.
Keynote: Developing our Quantum Future
cf. https://www.microsoft.com/en-us/quantum/development-kit
- applications
- chemistry - efficient fertilizer production, mitigation of global warming
- materials - lossless power lines, better batteries, smart materials
- machine learning - faster training, improved models
- optimization - healthcare diagnostics, traffic reduction
- cont. applications
- Quantum-safe privacy - QKD, communication, Networking, post-Quantum crypto
- Quantum sensing - biology, medicine, GPS, accelerometry,etc.
- Quantum games - learn superposition, entanglement, interference,
- Quantum speedups - semi-definite programming, linear systems of equations
- Toplological Quantum Computation
- https://arxiv.org/abs/quant-ph/0101025
- topology = properties that are insensitive to deformations(i.e., errors) in local geometry
- no local measurement can measure if the ropse is knotted.
- information encoded in knots is immune to local measurement
- empowering the quantum revolution
- a quantum software stack maps a quantum-accelerated programs to a hybrid quantum system
- Microsoft Quantum Development Kit
- Developing Quantum Applications
- Find quantum algorithm with quantum speedup <- starting point
- confirm quantum speedup after implementing all I/O and gate operations
- optimize code until runtime is short enough
- embed into specific hardware
- Examples: quantum chemistry of FeMoco
- Quantum algorithm (2012) - 30,000 years
- Quantum algorithm (2015) - 1.5 days
- Q#
operation nextRandomBits(): Result { mutable result = Zero; using(qubits = Qubit()) { H(qubits[0]); set result = M(qubits[0]); Reset(qubits[0]; } return result; }
- cf. https://docs.microsoft.com/quantum/concepts/the-qubit?view=qsharp-preview
- cf. https://github.com/Microsoft/Quantum
- Functors
- Type-parameterized functions and operations
- partial application
- cf. https://cloudblogs.microsoft.com/quantum/2018/07/23/learn-at-your-own-pace-with-microsoft-quantum-katas/
Data Movement I
- A Framework for Memory Oversubscription Management in Graphics Processing Units
- Eviction, Throttling and Compression selectively for different applications
- アプリでメモリアクセスパタンが違ってOversubscription対策も違う
- 3dconv -- striming access, small working set -> hiding eviction latency
- lud -- data reuse by kernels, small working set -> hiding eviction latency
- atax -- random access, large working set -> reducitn working set size
- Swizzle Inventor: Data Movement Synthesis for GPU Kernels
Swizzle Inventor - swizzleなGPUプログラムを合成してくれる.
- Scalable Processing of Contemporary Semi-Structured Data on Commodity Parallel Processors ― A Compilation-based Approach
- semi-structure data - flexible data mode, "nested", XML,JSON,etc.
- JSON-family data
- JSONのstream processingではautomataベースの方法がとれない
- match query, record states, recognize syntax structure
- streaming compilation
- query set, JSON grammer
- DFA + pushdown automaton -> streaming automaton
- parallelizing compilation
- path explosionに対応できるようにsyntaxを変更する
- JPStream
- https://github.com/AutomataLab/JPStream
Data Movement II
- Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration
- アクセラレータ作る時バッファ個別につくるの大変
- バッファを整理して"FIFO"みたいなIdiomとしてBuffetを提案
- Verilogの実装 - https://github.com/cwfletcher/buffets
- データオーケストレーション方法をImplicit/Explicit,Coupled/Decoupledのマトリックスで分類
- Buffetは,E.D.D.O想定
- HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations
- a communication library for heterogeneous pipeline computations
- Cumbersome inter-device data movements
- → lazy reference-based scheme
- region-based lazy data copy
- reference based task queue
- → lazy reference-based scheme
- End detection of pipeline processing
- → Late triggered inter-stage tracking
- Contentions on communication data structure
- →Bi-Layer contention relief
- StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory
https://thexsel.github.io/p/streambox/
- background - https://www.domo.com/learn/data-never-sleeps-6
- StreamBox-HBM, 3Dメモリ DRAM向けのストリームエンジン.
- 110 million records per second and 238 GB/s memory bandwidth
- challgenges
- hash grouping performs poorly on 3D memory
- → parallel sort for grouping, sort outperforms hash on 3D memory
- 3D mmoeory is capacity limited
- → only use 3D memory for in-memory index
- how to dynamiccaly map data/operators
- → balance two limited resources
- hash grouping performs poorly on 3D memory
- Evalutaion
- Yahoo stream benchmarkでFlink@KNLが10MRes/s → StreamBox-HBMは50MRes/s.
Potpourri
- CASCADE Just-In-Time Compilation for Verilog ― A New Technique for Improving the FPGA Programming Experience
- Just-in-time
- run code in a simulator
- compile in the background
- translate when finished
https://github.com/vmware/cascade
- DCNS: Automated Detection Of Conservative Non-Sleep Defects in the Linux Kernel
- waiting operatoin
- non-sleep operations
- sleep-able operations
- mdelayをmsleepに,GFP_ATOMICをGFP_KERNELに
- function pointer analysis
- A Case for Lease-Based, Utilitarian Resource Management on Mobile Devices
Potpourri(3): Androidアプリの無駄な電力消費を削減するランタイムの話
cf. https://orderlab.io/LeaseOS/