トップ差分一覧 Farm ソース検索ヘルプ PDF RSS ログイン

Diary/2010-3-17

そういえば読んでないということに気づいた．

Near-Optimal Cache Block Placement with Reactive Nonuniform Cache Architectures

fully deterministicなshared-memory multiprocessing <= デバグ，テスト，開発が容易に
- executes multiple threads that communicate via shared memory
- produces the same program output if given the same program input
all communication between threads is precisely the same for every execution
the simlplest way
- メモリ操作のたびにトークンの取得/開放を行う
- performance degradation(メモリアクセスのたびにトークンの開放を待つ/並列実行のbenefitを失う)
並列性をrecoverするために
- invalidation-based cache coherence protocolをスレッド間通信の発生時の識別に使用
- 投機実行
DNP-ShTab(cache coherence protocol)
DMP-TM, DMP-TMFwd(support for transactional memory)

address-correlated prefetchingをするにはMBクラスのメタデータが必要だった
two innocations for manageing metadata
- hash-based lookup: maintains an index of previously recorded miss-address seq.
- probabilitic update: applies only a randomly selected subset of updates to the hashed index table
  - reducing metadata emory traffic by a factor of 3.4

Accelerating Critical Section Execution with Asymmetric Multicore Architectures

is based on the asymmetric chip multiprocessor
- consists of at least one large, high-performance core and many small, power-efficient cores
small coreは，クリティカルセクションに到達したらlarge coreにリクエストを投げてstallする．large coreはクリティカルセクションの実行を終えたらsmall coreに通知，small coreの処理が再開．

tracking per-thread progress rates for each coexecuting thread (during SMT exec.)
- to improve QoS, SLA^[1], Performance predictability, serice differentiation, proportional-share performance
to determine the alone exec. time
- base cycle component
- miss event cycle component(cache, TLB, branch)
- waiting cycle component

AnySP = fully programmable architecture
- targets multiple application domain
- addresses these challenges for next-generation mobile^[2] signal processing
FFT, space-time block coding(STBC), low-density parity-check(LDPC)を解析
- SIMD幅，レジスタの値のライフタイム，命令ペアの頻度，Algorithm data-reordering patterns
AnySP processing element desgin
- Configurable multi-SIMD width support
- Temporary buffer and bypass network
- Flexible functional units
- SRAM-based swizzle network(post-bfabricationでのネットワークの変更が可能)
- Multiple output adder tree support

Predicting Voltage Droops Using Recurring Program and Microarchitectural Event Activity

voltage fluctuations can lead to timing violations or transistor lifetime issues
- dynamically learns to predict dangerous voltage fluctuations based on program and microarchitectural events
to reduce the gap between nominal and worst-case operating voltages
- voltage emergency predictor to identify imminent emergencies

a nanoscale sensor processor(nSP)
- addresses the aforementioned challenges
- by integrating molecular probe sensors and molecular-scale digital logic
a sensor array for environment monitoring/a simple processor core/a small memory for state and programs/a communication device

Gordon is a system architecture ofr data-centric applications combining low-power processors, flash memory, and data-centric programming systems
- for data-centric applications.
for large-scale data processing, current technology must overcome three challenges
- uniprocessorでの性能低下とCMPのプログラミングの難しさ
- ハードドライブ容量は増える一方で，レイテンシとバンド幅はそうではない
- 冷却，お金，エコからくる電力の制約
data-centric allipcation(Map-ReduceとaDryadとかでプログラムされている)
- solid-state sotorage deviceは，向上したバンド幅とレイテンシ削減に寄与
Gordon: a flash-based system architecture for massively parallel, data-centric computing
- solid-state disks
- low-power processors
- data-centric programming paradigms
- offers a flash management software layer that allows a highly parallel operation of large arrays of flash devices

architecting a DRAM alternative
- array architecture
- buffer organization(area neutrality, design space, delay and energy opt.)
- scaling and implications
mitigating wear and energy
- improving PCM lifetime(eliminating redundant bit writes, row shifting, segment swapping)
- analyzing energy implications