Diary/2009-1-4
東京に戻る
新幹線で.
ちょうど新幹線が行ったところでホームに上ったので
丸々1本分並んだことに.
すぐ横でTVカメラが三脚立てて撮影していたので,
きっと微塵も写っていないことでしょう.
窓際の電源席には座れなかったものの,
なんとか無事席を確保して,一路東京へ.
しかし,通路にも人がいっぱいでトイレに行くのも一苦労.
まあ,座っていられるだけ,随分幸せなものです.
Programming with Tiles
cateogryプラグインは存在しません。
2006のPPoPPで提案されたHierarchically Tiled Arrayに
動的分割と重なり合いの二つの新しいクラスを追加した.
記述の容易さとパフォーマンスの比較による評価
- HTA
Hierarchically Tiled Arrays(HTAs) are arrays that may be partitioned into tiles. THese tiles can be conventional arays or lower level HTAs. Tiles can be distributed across processors in a distributed-memory machine or be stored in a single machine according to a user specified layout
The C++ implementation of the HTA class is a library with ~18000 lines of code. It only contains header files, as most classes in the library are C:: templates to facilitate inlining.
- Dynamic partitioning
cache oblivious algorithms, FLAME require dynamic changes of the tile layout.
part/rmPartを追加.
←TBB requres more lines of code, variables, and data types than the HTA to express the same problem
- Overlapped tiling
Stencil codes benefit from tiling, because they increase locality and determine data distribution when running in parallel.
← programmers create a shadow or ghost region around each tile that contains a copy of the elements of the neighbor tiles
← automatically or manually update
- Evaluation
- 性能評価
- sequential(行列積,LU分解,3D Jacobi)
- parallel(Parallel Merge, MG/LU NAS)
- Readability/Productivity
- the programmng effort[17]
- the cyclomatic number[22]
- lines of code
A Portable Runtime Interface For Multi-Level Mmeory Hierarchies
[論文読み]
for moving data and computation through parallel machines with multi-level memory hierarchies
for multi-core/SMP, Cell B.E/分散メモリクラスタ
- The Runtime Interface
adaptation of the Sequoia compiler
- initialize/setup of the machine, including communicaton resources and resources at all levels where tasks can be executed
- data transfers between memory levels using asynchronous bulk transfers between arrays
- task execution at specified levels of the machine
- バルク転送を強化
- DISKもメモリも同様のインターフェイスで
- Top APIとBottom API
Multiscalar Processors
[論文読み]
Multiscalar processors use a new, aggressive implementation paradigm for extractign large quantities of ILP from ordinary high level languages programs.