Diary/2009-8-21
A Task-centric...(PACT'09)
[論文読み]
@inproceedings{kelm-pact09, author = {John H. Kelm and Daniel R. Johnson, Steven S. Lumetta and Matthew I. Frank and Sanjay J. Patel}, title = {A Task-centric Memory Model for Scalable Accelarator Architectures}, booktitle = {PACT '09: Proceedings of the 18th international conference on Parallel architectures and compilation techniques}, year = {2009}, pages = {???--???}, location = {Raleigh, North Carolina}, }
1024-coreで,MIMD(SIMDっぽくない,従来不向きとされていた)プログラムを
実行するためのメモリ管理の話.
visual programにおける共有データへのアクセスを解析
全体のキャッシュをH/Wで管理するのではなくS/Wで管理するためのプロトコルを規定
キューによるタスク管理でcompleteなタイミングでバリアを実現->[6]
結果は他のタスクマネジメント/メモリマネジメント手法との比較ではなかった.
命令のロードとかってどうしてるんだろう?
以下論文より
- Abstract
- task-centric memory model
- uses a software protocol
- working in collaboration with hardware caches
- to maintain a coherent, singl-address space view of memory w/o HW support
- for 1024-core MIMD accelarotor; Rigel
- Introduction
- task-centric memory model
- hw/sw protocol for maintaining a coherent view of shared memory for accelarotor
- visual computingが対象
- a form of bulk sync. processsingを使って開発される
- barrierの間(interval)は独立した並列処理の単位(task)が並列に実行
- analysisによるとwell-structured sharing patternsである
- a form of bulk sync. processsingを使って開発される
- DSMと似てる.違いは
- private $をもった1chipのプロセッサであるためshared-global $へのアクセスコストが小さい
- 1024-coreの accelarotor である Rigelが対象
- a single cacheable address space
- w/o hardware-enforced $ coherence across all cores on the chip
- Contributions
- data shareing patterns for class of emerging workloadsの観察
- a scalable task-centric memory model (for 1000-cores)
- optimization
- prefetching from DRAM is unimpeded and most beneficial to perf.
- overhead of the task-centric model can be minimal
- Motivation/Background
- data-parallel execution modelだけじゃなくてirregular task-parallel computationも考えたい
- Application Chracterization
- Parallelism Structure(Programming styles)
- bulk sync. processing
- - the tasks exchange little or no data within an interval
- - at the barrier, modified shared data is made globally visible
- - mostly-data-parallel, task-based shared-memory programming model, coherence management is required to enable sharing
- - do not depend on the HW support
- the programmer's attempt to create scalable code(minimum sharing)
- Parallelism Structure(Programming styles)
- Sharing Patterns
- sync. characteristics
- benchmarks
- - MRI benchmark(VISBench)
- - CG, sobl edge detection, k-means clustering, DMM(Rigel kernle benchmark suite)
- - GJK collision dtection benchmark(a freely-available seq.)
- - Heat (Cilk)
- Fig.1 and Fig.2 は,the freq. of non-private loads/stores
- - the majority of non-private loads are reads to data produced before the current interval began
- - both conflict reads and writes to data shared
- Accelarotor Workload Characteristics
- characteristics
- - read shared data is present within an iterval
- - sync. is coars-graind
- - small amounts of write-shared data within an interval
- - Fine-graind sync. (ex. atomic updates to shared data) is present but rare
- - wirte sharing within an interval is rare
- little coherence management is required
- Cache Coherence Management
- weekly consistent memory models
- explict local and global memory operations
- task-based programming model
- as a substitute for HW $
- Related Work
- bulk-sync. parallel(BSP) model $→$ CUDA, OpenCL
- OpenMP, Intel's TBB
- Workload(PARSEC, ALPBench)
- Memory Models
- Rigel Architecture and Task Model
今日のつぶやき
- MacOSXのiCal使いになってしまった.およよ (Fri Aug 21 15:55:44 2009)
- MindNodeを使ってみてる.かっこいいけど,自動で位置調整してくれないのは,若干不便. (Fri Aug 21 14:29:22 2009)
- さすがに,昨日の残りのドンブリに入ったビールは飲めないです. (Fri Aug 21 14:04:57 2009)
- A Task-centric Memory Model for Scalable Accelerator Architectures を読む (Fri Aug 21 11:22:18 2009)
- PACT'09の論文って読めるのも多いのね. (Fri Aug 21 11:21:39 2009)