# Analyzing the Impact of CPU Pinning and Partial CPU Loads on Performance and Energy Efficiency

This site lists links to supplementary plots for our CCGrid 2015 submission:

# Supplementary plots

supplementary.pdf |

We provide a file of supplementary figures that extend and complement the figures and tables available in our paper.

Besides variants of the plots taken with a different Showstopper configuration and with TurboBoost and frequency scaling switched off, the supplementary figures also include tables directly comparing application throughput under KVM and LXC, which could not be included in the paper due to space constraints.

The supplementary plots and tables are organized as follows:

- CPU pinning overview
**(page 1)** - example experiment time line for KVM and LXC
**(page 2)** - 5×5 foreground and background throughput plots for the 107ms dithering quantum
**(page 3)** - 10×10 foreground and background throughput plots for the 107ms dithering quantum
**(page 4–9)** - 5×5 foreground and background throughput plots for the 53ms dithering quantum
**(page 10)** - 5×5 foreground and background throuhgput plots for the 107ms dithering quantum, disabled TurboBoost and disbled frequency scaling
**(page 11)** - performance interference metric tables for the 5×5 experiments with the 107ms dithering quantum
**(page 12)** - performance interference metric tables for the 10×10 experiments with the 107ms dithering quantum
**(page 13–14)** - performance interference metric tables for the 5×5 experiments with the 53ms dithering quantum
**(page 15)** - performance interference metric tables for the 5×5 experiments with the 107ms dithering quantum, disabled TurboBoost and disbled frequency scaling
**(page 15)** - absolute system throughput time lines
**(page 16)**- 107ms dithering quantum
- 53ms dithering quantum
- 107ms dithering quantum, disabled TurboBoost, disabled frequency scaling

- relative system throughput time lines (relative towards the "per-chip" pining)
**(page 17)**- 107ms dithering quantum
- 53ms dithering quantum
- 107ms dithering quantum, disabled TurboBoost, disabled frequency scaling

- system power consumption time lines for symmetric colocations of 5 chosen benchmarks
**(page 18)**- 107ms dithering quantum
- 53ms dithering quantum

- absolute system power efficiency time lines
**(page 19)**- 107ms dithering quantum
- 53ms dithering quantum

- relative system power efficiency time lines (relative towards the "per-chip" pining)
**(page 19)**- 107ms dithering quantum
- 53ms dithering quantum

- 5×5 KVM and LXC throughput comparison tables for the 107ms dithering quantum
**(page 20)** - 10×10 KVM and LXC throughput comparison tables for the 107ms dithering quantum
**(page 21–22)** - 5×5 KVM and LXC throughput comparison tables for the 53ms dithering quantum
**(page 23)** - 5×5 KVM and LXC throughput comparison tables for the 107ms dithering quantum, disabled TurboBoost and disbled frequency scaling
**(page 24)**

supplementary.pdf |

# Time line plots — performance counters, throughput, power consumption

## Cycles per instruction and power consumption

107ms quantum | 53ms quantum | ||
---|---|---|---|

KVM | cpi-power.kvm | KVM | cpi-power.kvm |

LXC | cpi-power.lxc | LXC | cpi-power.lxc |

both | cpi-power.all | both | cpi-power.all |

- Pages
- Pages correspond to workload colocations of
`avrora`

,`h2`

,`luindex`

,`scalac`

and`specs`

. There are 25 pages corresponding to the 5×5 colocations. - Rows
- There are 2 rows of plots per page. Rows represent observed quantities:
**system-wide average number of cycles per instruction****system power consumption**

- Columns
- There are 4 (KVM), 3 (LXC) or 7 (both LXC and KVM) columns of plots per page. Columns represent pinning configurations, compared side-by-side.

## Cycles per instruction and throughput

107ms quantum | 53ms quantum | ||||
---|---|---|---|---|---|

blue and green ≡ FG and BG VEs | blue and green ≡ NUMA nodes | blue and green ≡ FG and BG VEs | blue and green ≡ NUMA nodes | ||

KVM | cpi-throughput.kvm.wl | cpi-throughput.kvm.numa | KVM | cpi-throughput.kvm.wl | cpi-throughput.kvm.numa |

LXC | cpi-throughput.lxc.wl | cpi-throughput.lxc.numa | LXC | cpi-throughput.lxc.wl | cpi-throughput.lxc.numa |

both | cpi-throughput.all.wl | cpi-throughput.all.numa | both | cpi-throughput.all.wl | cpi-throughput.all.numa |

- Pages
- Pages correspond to workload colocations of
`avrora`

,`h2`

,`luindex`

,`scalac`

and`specs`

. There are 25 pages corresponding to the 5×5 colocations. - Rows
- There are 3 rows of plots per page. Rows represent observed quantities:
**number of cycles per instruction**, two averages per NUMA node or per virtualized environment (VE)**throughput in the foreground VE**with an uncontrolled workload**throughput in the background VE**with a Showstopper-controlled workload

- Columns
- There are 4 (KVM), 3 (LXC) or 7 (both LXC and KVM) columns of plots per page. Columns represent pinning configurations, compared side-by-side.

## Cycles per instruction, power consumption and throughput (two sets of plots above combined)

107ms quantum | 53ms quantum | ||||
---|---|---|---|---|---|

blue and green ≡ FG and BG VEs | blue and green ≡ NUMA nodes | blue and green ≡ FG and BG VEs | blue and green ≡ NUMA nodes | ||

KVM | cpi-combined.kvm.wl | cpi-combined.kvm.numa | KVM | cpi-combined.kvm.wl | cpi-combined.kvm.numa |

LXC | cpi-combined.lxc.wl | cpi-combined.lxc.numa | LXC | cpi-combined.lxc.wl | cpi-combined.lxc.numa |

both | cpi-combined.all.wl | cpi-combined.all.numa | both | cpi-combined.all.wl | cpi-combined.all.numa |

- Pages
- Pages correspond to workload colocations of
`avrora`

,`h2`

,`luindex`

,`scalac`

and`specs`

. There are 25 pages corresponding to the 5×5 colocations. - Rows
- There are 5 rows of plots per page. Rows represent observed quantities:
**number of cycles per instruction**, two averages per NUMA node or per virtualized environment (VE)**throughput in the foreground VE**with an uncontrolled workload**throughput in the background VE**with a Showstopper-controlled workload**system-wide average number of cycles per instruction****system power consumption**

- Columns
- There are 4 (KVM), 3 (LXC) or 7 (both LXC and KVM) columns of plots per page. Columns represent pinning configurations, compared side-by-side.

## All low-level performance counters (cycles per instruction, cache misses, pipeline stalls)

107ms quantum | 53ms quantum | ||||||
---|---|---|---|---|---|---|---|

blue and green ≡ FG and BG VEs | blue and green ≡ NUMA nodes | blue and green ≡ FG and BG VEs | blue and green ≡ NUMA nodes | ||||

KVM | vertical | vmetrics.kvm.wl | vmetrics.kvm.numa | KVM | vertical | vmetrics.kvm.wl | vmetrics.kvm.numa |

horizontal | hmetrics.kvm.wl | hmetrics.kvm.numa | horizontal | hmetrics.kvm.wl | hmetrics.kvm.numa | ||

LXC | vertical | vmetrics.lxc.wl | vmetrics.lxc.numa | LXC | vertical | vmetrics.lxc.wl | vmetrics.lxc.numa |

horizontal | hmetrics.lxc.wl | hmetrics.lxc.numa | horizontal | hmetrics.lxc.wl | hmetrics.lxc.numa | ||

both | vertical | vmetrics.all.wl | vmetrics.all.numa | both | vertical | vmetrics.all.wl | vmetrics.all.numa |

horizontal | hmetrics.all.wl | hmetrics.all.numa | horizontal | hmetrics.all.wl | hmetrics.all.numa |

- Pages
- Pages correspond to workload colocations of
`avrora`

,`h2`

,`luindex`

,`scalac`

and`specs`

. There are 25 pages corresponding to the 5×5 colocations. Horizontal and vertical layouts are available. The list of rows and columns below corresponds to the vertical layout. - Rows
- There are 13 rows of plots per page. Rows represent observed quantities:
**number of cycles per instruction**, two averages per NUMA node or per VE**L1 data cache miss rate**, two averages per NUMA node or per VE**L2 cache miss rate**, two averages per NUMA node or per VE**L3 cache miss rate**, two averages per NUMA node or per VE**NUMA load/store/prefetch miss rate**, two averages per NUMA node or per VE**number of L1 data cache misses**, two averages per NUMA node or per VE**number of L2 cache misses**, two averages per NUMA node or per VE**number of L3 cache misses**, two averages per NUMA node or per VE**number of NUMA load/store/prefetch operations**, two averages per NUMA node or per VE**number of NUMA load/store/referch misses**, two averages per NUMA node or per VE**pipeline frontend stalls**, two averages per NUMA node or per VE**pipeline backend stalls**, two averages per NUMA node or per VE**system power consumption**

- Columns

## Comparing KVM and LXC in terms of low-level performance counters and confidence intervals

blue and green denote KVM and LXC | |||||
---|---|---|---|---|---|

107ms quantum | 53ms quantum | ||||

t-test (assumes normality) | order statistics (assumes uniformity) | t-test (assumes normality) | order statistics (assumes uniformity) | ||

vertical | vstatmetrics.ttest | vstatmetrics.order | vertical | vstatmetrics.ttest | vstatmetrics.order |

horizontal | hstatmetrics.ttest | hstatmetrics.order | horizontal | hstatmetrics.ttest | hstatmetrics.order |

- Pages
- Pages correspond to workload colocations of
`avrora`

,`h2`

,`luindex`

,`scalac`

and`specs`

. There are 25 pages corresponding to the 5×5 colocations. Horizontal and vertical layouts are available. The list of rows and columns below corresponds to the vertical layout. - Rows
- There are 15 rows of plots per page. Rows represent observed quantities:
**number of cycles per instruction**, means (t-test) or medians (order) and CIs for KVM and LXC**L1 data cache miss rate**, means (t-test) or medians (order) and CIs for KVM and LXC**L2 cache miss rate**, means (t-test) or medians (order) and CIs for KVM and LXC**L3 cache miss rate**, means (t-test) or medians (order) and CIs for KVM and LXC**NUMA load/store/prefetch miss rate**, means (t-test) or medians (order) and CIs for KVM and LXC**number of L1 data cache misses**, means (t-test) or medians (order) and CIs for KVM and LXC**number of L2 cache misses**, means (t-test) or medians (order) and CIs for KVM and LXC**number of L3 cache misses**, means (t-test) or medians (order) and CIs for KVM and LXC**number of NUMA load/store/prefetch operations**, means (t-test) or medians (order) and CIs for KVM and LXC**number of NUMA load/store/referch misses**, means (t-test) or medians (order) and CIs for KVM and LXC**pipeline frontend stalls**, means (t-test) or medians (order) and CIs for KVM and LXC**pipeline backend stalls**, means (t-test) or medians (order) and CIs for KVM and LXC**system power consumption**, means (t-test) or medians (order) and CIs for KVM and LXC**foreground throughput (uncontrolled workload)**, means (t-test) or medians (order) and CIs for KVM and LXC**background throughput (controlled workload)**, means (t-test) or medians (order) and CIs for KVM and LXC

- Columns
- There are 3 columns of plots corresponding to the 3 CPU pinning configurations common to KVM and LXC ("per-chip", "per-core" and "per-thread").

# Compressed archives with all time line plots

107ms quantum | 53ms quantum |
---|---|

plots.tar.xz | plots.tar.xz |

plots.tar.bz2 | plots.tar.bz2 |

All plots of low-level metrics listed in the previous section can be downloaded in one big compressed archive.

# Raw listing of all available files

107ms quantum | 53ms quantum |
---|---|

file list | file list |

For those already familiar with the file naming convention, we provide a quick and easy access to a raw listing of all files of plots linked above. Additionally, two simple text files with a summary of average numeric values, easy to search and sort, are provided for convenience, for the 107ms dithering quantum as well as for the 53ms dithering quantum. They compare the cycles per instruction statistics for NUMA nodes and worklaods, the total amount of work (number of iterations) accomplished in the foreground and background virtualized environments and the amount of energy consumed in an experiment.