Search results for: Andreas Sembrant

Items from 1 to 14 out of 14 results

chapter

A graphics tracing framework for exploring CPU+GPU memory systems

Andreas Sembrant, Trevor E. Carlson, Erik Hagersten, David Black-Schaffer

2017 IEEE International Symposium on Workload Characterization (IISWC) > 54 - 65

2017 IEEE International Symposium on Workload Characterization (IISWC)

Modern SoCs contain CPU and GPU cores to execute both general purpose and highly-parallel graphics workloads. While the primary use of the GPU is for rendering graphics, the effects of graphics workloads on the overall system have received little attention. The primary reason for this is the lack of efficient tools and simulators for modern graphics applications. In this work, we present GLTraceSim,...

chapter

Analyzing graphics workloads on tile-based GPUs

German Ceballos, Andreas Sembrant, Trevor E. Carlson, David Black-Schaffer

2017 IEEE International Symposium on Workload Characterization (IISWC) > 108 - 109

2017 IEEE International Symposium on Workload Characterization (IISWC)

Graphics rendering is a complex, multi-step process whose data demands typically dominate memory system design in SoCs. GPUs create images by merging many, simpler scenes for each frame. For performance, scenes are tiled into parallel tasks, each of which produces different parts of the final output. This execution model results in complex memory behavior, whose bandwidth demands, reuse and sharing...

chapter

POSTER: Putting the G back into GPU/CPU Systems Research

Andreas Sembrant, Trevor E. Carlson, Erik Hagersten, David Black-Schaffer

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 130 - 131

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Modern SoCs contain several CPU cores and many GPU cores to execute both general purpose and highly-parallel graphics workloads. In many SoCs, more area is dedicated to graphics than to general purpose compute. Despite this, the micro-architecture research community primarily focuses on GPGPU and CPU-only research, and not on graphics (the primary workload for many SoCs). The main reason for this...

chapter

TLC: A tag-less cache for reducing dynamic first level cache energy

Andreas Sembrant, Erik Hagersten, David Black-Shaffer

2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 49 - 61

2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

First level caches are performance-critical and are therefore optimized for speed. To do so, modern processors reduce the miss ratio by using set-associative caches and optimize latency by reading all ways in parallel with the TLB and tag lookup. However, this wastes energy since only data from one way is actually used. To reduce energy, phased-caches and way-prediction techniques have been proposed...

chapter

A Split Cache Hierarchy for Enabling Data-Oriented Optimizations

Andreas Sembrant, Erik Hagersten, David Black-Schaffer

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 133 - 144

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Today's caches tightly couple data with metadata (Address Tags) at the cache line granularity. The co-location of data and its identifying metadata means that they require multiple approaches to locate data (associative way searches and level-by-level searches), evict data (coherent writebacks buffers and associative level-by-level searches) and keep data coherent (directory indirections and associative...

chapter

Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement

Andreas Sembrant, Erik Hagersten, David Black-Schaffer

2016 IEEE 34th International Conference on Computer Design (ICCD) > 117 - 124

2016 IEEE 34th International Conference on Computer Design (ICCD)

Modern processors employ multiple levels of caching to address bandwidth, latency and performance requirements. The behavior of these hierarchies is determined by their approach to data placement and data eviction. Recent research has developed many intelligent data eviction policies, but cache hierarchies remain primarily either exclusive or inclusive with regards to data placement. This means that...

chapter

Cost-effective speculative scheduling in high performance processors

Arthur Perais, Andre Seznec, Pierre Michaud, Andreas Sembrant, more

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) > 247 - 259

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

To maximize performance, out-of-order execution processors sometimes issue instructions without having the guarantee that operands will be available in time; e.g. loads are typically assumed to hit in the L1 cache and dependent instructions are issued accordingly. This form of speculation - that we refer to as speculative scheduling - has been used for two decades in real processors, but has received...

chapter

Long term parking (LTP): Criticality-aware resource allocation in OOO processors

Andreas Sembrant, Trevor Carlson, Erik Hagersten, David Black-Shaffer, more

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 334 - 346

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Modern processors employ large structures (IQ, LSQ, register file, etc.) to expose instruction-level parallelism (ILP) and memory-level parallelism (MLP). These resources are typically allocated to instructions in program order. This wastes resources by allocating resources to instructions that are not yet ready to be executed and by eagerly allocating resources to instructions that are not part of...

chapter

Navigating the cache hierarchy with a single lookup

Andreas Sembrant, Erik Hagersten, David Black-Schaffer

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA) > 133 - 144

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)

Modern processors optimize for cache energy and performance by employing multiple levels of caching that address bandwidth, low-latency and high-capacity. A request typically traverses the cache hierarchy, level by level, until the data is found, thereby wasting time and energy in each level. In this paper, we present the Direct-to-Data (D2D) cache that locates data across the entire cache hierarchy...

chapter

Modeling performance variation due to cache sharing

Andreas Sandberg, Andreas Sembrant, Erik Hagersten, David Black-Schaffer

2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA) > 155 - 166

2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

Shared cache contention can cause significant variability in the performance of co-running applications from run to run. This variability arises from different overlappings of the applications' phases, which can be the result of offsets in application start times or other delays in the system. Understanding this variability is important for generating an accurate view of the expected impact of cache...

chapter

Phase behavior in serial and parallel applications

Andreas Sembrant, David Black-Schaffer, Erik Hagersten

2012 IEEE International Symposium on Workload Characterization (IISWC) > 47 - 58

2012 IEEE International Symposium on Workload Characterization (IISWC)

It is well known that most serial programs exhibit time varying behavior, for example, alternating between memory- and compute-bound phases. However, most research into program phase behavior has focused on the serial SPEC benchmark suite, with little investigations into large scale phase behavior in parallel applications.

chapter

Low Overhead Instruction-Cache Modeling Using Instruction Reuse Profiles

Muneeb Khan, Andreas Sembrant, Erik Hagersten

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing > 260 - 269

2012 24th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Performance loss caused by L1 instruction cache misses varies between different architectures and cache sizes. For processors employing power-efficient in-order execution with small caches, performance can be significantly affected by instruction cache misses. The growing use of low-power multi-threaded CPUs (with shared L1 caches) in general purpose computing platforms requires new efficient techniques...

chapter

Power-Sleuth: A Tool for Investigating Your Program's Power Behavior

Vasileios Spiliopoulos, Andreas Sembrant, Stefanos Kaxiras

2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems > 241 - 250

2012 IEEE 20th International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)

Modern processors support aggressive power saving techniques to reduce energy consumption. However, traditional profiling techniques have mainly focused on performance, which does not accurately reflect the power behavior of applications. For example, the longest running function is not always the most energy-hungry function. Thus software developers cannot always take full advantage of these power-saving...

chapter

Efficient software-based online phase classification

Andreas Sembrant, David Eklov, Erik Hagersten

2011 IEEE International Symposium on Workload Characterization (IISWC) > 104 - 115

2011 IEEE International Symposium on Workload Characterization (IISWC)

Many programs exhibit execution phases with time-varying behavior. Phase detection has been used extensively to find short and representative simulation points, used to quickly get representative simulation results for long-running applications. Several proposals for hardware-assisted phase detection have also been proposed to guide various forms of optimizations and hardware configurations.

Filter options

Publication date

Set your own date range

Keywords

GRAPHICS PROCESSING UNITS (3)
RADIATION DETECTORS (3)
ANALYTICAL MODELS (2)
ARRAYS (2)
BANDWIDTH (2)
BENCHMARK TESTING (2)
PHASE DETECTION (2)
PROGRAM PROCESSORS (2)
RANDOM ACCESS MEMORY (2)
RENDERING (COMPUTER GRAPHICS) (2)
TOOLS (2)
ABSTRACTS (1)
APPROXIMATION METHODS (1)
COHERENCE (1)
COUPLINGS (1)
DVFS (1)
ENCODING (1)
ESTIMATION (1)
GRAPHICS (1)
HARDWARE (1)
HISTORY (1)
INSTRUCTION SETS (1)
LEGGED LOCOMOTION (1)
METADATA (1)
MICROARCHITECTURE (1)
MONITORING (1)
MULTICORE PROCESSING (1)
NAVIGATION (1)
OPTIMIZATION (1)
OUT OF ORDER (1)
PINS (1)
PIPELINES (1)
POLLUTION (1)
POWER ESTIMATION (1)
POWER PROFILING (1)
PREFETCHING (1)
RADIO FREQUENCY (1)
REGISTERS (1)
RESOURCE MANAGEMENT (1)
RUNTIME (1)
STANDARDS (1)
SUPPORT VECTOR MACHINE CLASSIFICATION (1)
SYNCHRONIZATION (1)
SYSTEM PERFORMANCE (1)
THREE-DIMENSIONAL DISPLAYS (1)
TIME FREQUENCY ANALYSIS (1)
TRACKING (1)
VECTORS (1)
more

INFONA - science communication portal

Search results for: Andreas Sembrant

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options