トップ 差分 一覧 Farm ソース 検索 ヘルプ PDF RSS ログイン

Diary/2019-4-15

ASPLOS三日目

本会議の一日目.

Keynote: Developing our Quantum Future

cf. https://www.microsoft.com/en-us/quantum/development-kit

  • applications
    • chemistry - efficient fertilizer production, mitigation of global warming
    • materials - lossless power lines, better batteries, smart materials
    • machine learning - faster training, improved models
    • optimization - healthcare diagnostics, traffic reduction
  • cont. applications
    • Quantum-safe privacy - QKD, communication, Networking, post-Quantum crypto
    • Quantum sensing - biology, medicine, GPS, accelerometry,etc.
    • Quantum games - learn superposition, entanglement, interference,
    • Quantum speedups - semi-definite programming, linear systems of equations
  • Toplological Quantum Computation
    • https://arxiv.org/abs/quant-ph/0101025
    • topology = properties that are insensitive to deformations(i.e., errors) in local geometry
      • no local measurement can measure if the ropse is knotted.
      • information encoded in knots is immune to local measurement
  • empowering the quantum revolution
    • a quantum software stack maps a quantum-accelerated programs to a hybrid quantum system
  • Microsoft Quantum Development Kit
  • Developing Quantum Applications
    • Find quantum algorithm with quantum speedup <- starting point
    • confirm quantum speedup after implementing all I/O and gate operations
    • optimize code until runtime is short enough
    • embed into specific hardware
  • Examples: quantum chemistry of FeMoco
    • Quantum algorithm (2012) - 30,000 years
    • Quantum algorithm (2015) - 1.5 days
  • Q#
operation nextRandomBits(): Result {
  mutable result = Zero;
  using(qubits = Qubit()) {
   H(qubits[0]);
   set result = M(qubits[0]);
   Reset(qubits[0];
  }
  return result;
}

Data Movement I

A Framework for Memory Oversubscription Management in Graphics Processing Units
  • Eviction, Throttling and Compression selectively for different applications
  • アプリでメモリアクセスパタンが違ってOversubscription対策も違う
    • 3dconv -- striming access, small working set -> hiding eviction latency
    • lud -- data reuse by kernels, small working set -> hiding eviction latency
    • atax -- random access, large working set -> reducitn working set size

Swizzle Inventor: Data Movement Synthesis for GPU Kernels

Swizzle Inventor - swizzleなGPUプログラムを合成してくれる.


Scalable Processing of Contemporary Semi-Structured Data on Commodity Parallel Processors ― A Compilation-based Approach
  • semi-structure data - flexible data mode, "nested", XML,JSON,etc.
    • JSON-family data
  • JSONのstream processingではautomataベースの方法がとれない
    • match query, record states, recognize syntax structure
  • streaming compilation
    • query set, JSON grammer
    • DFA + pushdown automaton -> streaming automaton
  • parallelizing compilation
    • path explosionに対応できるようにsyntaxを変更する
  • JPStream
  • https://github.com/AutomataLab/JPStream

Data Movement II

Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration
  • アクセラレータ作る時バッファ個別につくるの大変
  • バッファを整理して"FIFO"みたいなIdiomとしてBuffetを提案
  • Verilogの実装 - https://github.com/cwfletcher/buffets
  • データオーケストレーション方法をImplicit/Explicit,Coupled/Decoupledのマトリックスで分類
  • Buffetは,E.D.D.O想定

HiWayLib: A Software Framework for Enabling High Performance Communications for Heterogeneous Pipeline Computations
  • a communication library for heterogeneous pipeline computations
  • Cumbersome inter-device data movements
    • → lazy reference-based scheme
      • region-based lazy data copy
      • reference based task queue
  • End detection of pipeline processing
    • → Late triggered inter-stage tracking
  • Contentions on communication data structure
    • →Bi-Layer contention relief

StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory

https://thexsel.github.io/p/streambox/

  • background - https://www.domo.com/learn/data-never-sleeps-6
  • StreamBox-HBM, 3Dメモリ DRAM向けのストリームエンジン.
    • 110 million records per second and 238 GB/s memory bandwidth
  • challgenges
    • hash grouping performs poorly on 3D memory
      • → parallel sort for grouping, sort outperforms hash on 3D memory
    • 3D mmoeory is capacity limited
      • → only use 3D memory for in-memory index
    • how to dynamiccaly map data/operators
      • → balance two limited resources
  • Evalutaion
    • Yahoo stream benchmarkでFlink@KNLが10MRes/s → StreamBox-HBMは50MRes/s.

Potpourri

CASCADE Just-In-Time Compilation for Verilog ― A New Technique for Improving the FPGA Programming Experience
  • Just-in-time
    • run code in a simulator
    • compile in the background
    • translate when finished

https://github.com/vmware/cascade

DCNS: Automated Detection Of Conservative Non-Sleep Defects in the Linux Kernel
  • waiting operatoin
    • non-sleep operations
    • sleep-able operations
  • mdelayをmsleepに,GFP_ATOMICをGFP_KERNELに
  • function pointer analysis

A Case for Lease-Based, Utilitarian Resource Management on Mobile Devices

Potpourri(3): Androidアプリの無駄な電力消費を削減するランタイムの話
cf. https://orderlab.io/LeaseOS/