トップ 差分 一覧 Farm ソース 検索 ヘルプ PDF RSS ログイン

Diary/2009-8-21

A Task-centric...(PACT'09)

[論文読み]

@inproceedings{kelm-pact09,
author = {John H. Kelm and Daniel R. Johnson, Steven S. Lumetta and Matthew I. Frank and Sanjay J. Patel},
title = {A Task-centric Memory Model for Scalable Accelarator Architectures},
booktitle = {PACT '09: Proceedings of the 18th international conference on Parallel architectures and compilation techniques},
year = {2009},
pages = {???--???},
location = {Raleigh, North Carolina},
}

1024-coreで,MIMD(SIMDっぽくない,従来不向きとされていた)プログラムを
実行するためのメモリ管理の話.
visual programにおける共有データへのアクセスを解析
全体のキャッシュをH/Wで管理するのではなくS/Wで管理するためのプロトコルを規定
キューによるタスク管理でcompleteなタイミングでバリアを実現->[6]
結果は他のタスクマネジメント/メモリマネジメント手法との比較ではなかった.
命令のロードとかってどうしてるんだろう?

以下論文より

Abstract
  • task-centric memory model
    • uses a software protocol
    • working in collaboration with hardware caches
    • to maintain a coherent, singl-address space view of memory w/o HW support
  • for 1024-core MIMD accelarotor; Rigel

Introduction
  • task-centric memory model
    • hw/sw protocol for maintaining a coherent view of shared memory for accelarotor
  • visual computingが対象
    • a form of bulk sync. processsingを使って開発される
      • barrierの間(interval)は独立した並列処理の単位(task)が並列に実行
    • analysisによるとwell-structured sharing patternsである
  • DSMと似てる.違いは
    • private $をもった1chipのプロセッサであるためshared-global $へのアクセスコストが小さい
  • 1024-coreの accelarotor である Rigelが対象
    • a single cacheable address space
    • w/o hardware-enforced $ coherence across all cores on the chip
  • Contributions
    • data shareing patterns for class of emerging workloadsの観察
    • a scalable task-centric memory model (for 1000-cores)
    • optimization
      • prefetching from DRAM is unimpeded and most beneficial to perf.
    • overhead of the task-centric model can be minimal

Motivation/Background
  • data-parallel execution modelだけじゃなくてirregular task-parallel computationも考えたい
  • Application Chracterization
    • Parallelism Structure(Programming styles)
      • bulk sync. processing
      • - the tasks exchange little or no data within an interval
      • - at the barrier, modified shared data is made globally visible
      • - mostly-data-parallel, task-based shared-memory programming model, coherence management is required to enable sharing
      • - do not depend on the HW support
      • the programmer's attempt to create scalable code(minimum sharing)

    • Sharing Patterns
      • sync. characteristics
      • benchmarks
      • - MRI benchmark(VISBench)
      • - CG, sobl edge detection, k-means clustering, DMM(Rigel kernle benchmark suite)
      • - GJK collision dtection benchmark(a freely-available seq.)
      • - Heat (Cilk)
      • Fig.1 and Fig.2 は,the freq. of non-private loads/stores
      • - the majority of non-private loads are reads to data produced before the current interval began
      • - both conflict reads and writes to data shared
    • Accelarotor Workload Characteristics
      • characteristics
      • - read shared data is present within an iterval
      • - sync. is coars-graind
      • - small amounts of write-shared data within an interval
      • - Fine-graind sync. (ex. atomic updates to shared data) is present but rare
      • - wirte sharing within an interval is rare
      • little coherence management is required
    • Cache Coherence Management
      • weekly consistent memory models
      • explict local and global memory operations
      • task-based programming model
      • as a substitute for HW $

  • Related Work
    • bulk-sync. parallel(BSP) model $→$ CUDA, OpenCL
    • OpenMP, Intel's TBB
    • Workload(PARSEC, ALPBench)
    • Memory Models

Rigel Architecture and Task Model

今日のつぶやき

  • MacOSXのiCal使いになってしまった.およよ (Fri Aug 21 15:55:44 2009)
  • MindNodeを使ってみてる.かっこいいけど,自動で位置調整してくれないのは,若干不便. (Fri Aug 21 14:29:22 2009)
  • さすがに,昨日の残りのドンブリに入ったビールは飲めないです. (Fri Aug 21 14:04:57 2009)
  • A Task-centric Memory Model for Scalable Accelerator Architectures を読む (Fri Aug 21 11:22:18 2009)
  • PACT'09の論文って読めるのも多いのね. (Fri Aug 21 11:21:39 2009)