Windyland serving my blog

The Null Block Device

Linux Kernel provides null_blk module since 3.13, which can be used for benchmarking various block-layer implementations. Also it can be used as a dummy device to diagnostic storage system issues.

Check your kernel has null_blk module

  • grep CONFIG_BLK_DEV_NULL /boot/config-$(uname -r)
  • or grep CONFIG_BLK_DEV_NULL /proc/config

for example centos 7 has:

CONFIG_BLK_DEV_NULL=m

Use Null blk

the usage is much like zram, just insmod this module:

insmod null_blk

you can find two 250 GB-sized block device added

$lsbllk
...
nullb0      251:0    0   250G  0 disk
nullb1      251:1    0   250G  0 disk
...

and run some benchmark on them:

  • dd if=/dev/zero of=/dev/nullb0 bs=4k oflag=direct
  • dd if=/dev/nullb0 of=/dev/null bs=4k iflag=direct
  • hdparm -tT /dev/nullb0
  • aio-stress -O -s 64m -r 256k -i 1024 -b 1024 /dev/nullb0
  • fio --ioengine=libaio --readwrite=randread --bs=4k --filename /dev/nullb0 --name journal_test --thread --norandommap --numjobs=200 --iodepth=64 --runtime=30 --time_based --group_reporting
  • fio --ioengine=libaio --readwrite=randwrite --bs=4k --filename /dev/nullb0 --name journal_test --thread --norandommap --numjobs=200 --iodepth=64 --runtime=30 --time_based --group_reporting

if you turn on scsi_mq, you will find incredible iops on your fio benchmarks, such as 5M iops on randread on my machine.

Advanced Usage

null_blk module has many parameters provided, use can tune:

  • submit_queues: the number of submission queues
  • gb: size in GiB
  • bs: block size in bytes
  • nr_devices: number of devices to register
  • completion_nsec: time in ns to complete a request in hardware
  • hw_queue_depth: queue depth for each hardware queue

for the case for debian backported 4.6.0 kernel:

sudo modinfo null_blk
filename: /lib/modules/4.6.0-0.bpo.1-amd64/kernel/drivers/block/null_blk.ko
license:        GPL
author:         Jens Axboe <jaxboe@fusionio.com>
depends:
intree:         Y
vermagic:       4.6.0-0.bpo.1-amd64 SMP mod_unload modversions
parm:           submit_queues:Number of submission queues (int)
parm:           home_node:Home node for the device (int)
parm:           queue_mode:Block interface to use (0=bio,1=rq,2=multiqueue)
parm:           gb:Size in GB (int)
parm:           bs:Block size (in bytes) (int)
parm:           nr_devices:Number of devices to register (int)
parm:           use_lightnvm:Register as a LightNVM device (bool)
parm:           irqmode:IRQ completion handler. 0-none, 1-softirq, 2-timer
parm:           completion_nsec:Time in ns to complete a request in hardware.  Default: 10,000ns (ulong)
parm:           hw_queue_depth:Queue depth for each hardware queue. Default: 64
(int)
parm:           use_per_node_hctx:Use per-node allocation for hardware context queues. Default: false (bool)

there parameters are also described in null blk kernel documentation

Reference

Some notes on atomic operations

Atomic operations are quite useful in concurrent programming, notably in the implementations of lock-free algorithms. Many times when the locking algorithms goes to the performance bottleneck, atomic operations come to save.

Atomic ordering

  • NotAtomic (regular load and store)
  • Unordered (to match java safe language memory model)
  • Monotinic (or memory_order_relaxed)
  • Acquire (or memory_order_acquire and memory_order_consume)
  • Release (or memory_order_release)
  • AcuireRelease (or memory_oder_acq_rel)
  • SquentiallyConsistent (or memory_order_seq_cst)

Platforms implementations

X86

all atomic loads generate a MOV and SquentiallyConsistent stores generate an XCHG, other stores generate a MOV.

on ARM (before v8) and MIPS

Acquire, Release and SquentiallyConsistent requires barrier instructions for every such operation. Loads and stores generate normal instructions.

Language Standard and Compiler, library implementations

  • The new C++11 atomic header and C11 stdatomic.h header
  • Java-style volatile variables (match SquentiallyConsistent)
  • gcc __sync_* builtins (match SquentiallyConsistent)

if you want to use atomic operations before sticking to the new standards, there are compiler builtins functions and libraries availabe to call proper asm instructions to do these atomic operations.

gcc atomic operations Built-in Functions:

__atomic_init
__atomic_thread_fence
__atomic_signal_fence
__atomic_is_lock_free
__atomic_store
__atomic_load
__atomic_exchange
__atomic_compare_exchange_strong
__atomic_compare_exchange_weak
__atomic_fetch_add
__atomic_fetch_sub
__atomic_fetch_and
__atomic_fetch_or
__atomic_fetch_xor

A Small Benchmark

I wrote a small program to benchmark the performance of atomic operations against mutexes and spinlocks. It is hosted at GitHub.. You can clone the repository and execute make. It requires a modern c++ compiler with c++11 support.

this sample output is on my 4-cores 2013-late Macbook Pro:

jobs:  1 total time:  614932094 ns average time:  6149 ns (Mutex)
jobs:  1 total time:    5132226 ns average time:    51 ns (SpinLock)
jobs:  1 total time:    4172785 ns average time:    41 ns (Atomic)
jobs:  2 total time:  746401303 ns average time:  7464 ns (Mutex)
jobs:  2 total time:   15983439 ns average time:   159 ns (SpinLock)
jobs:  2 total time:    9356120 ns average time:    93 ns (Atomic)
jobs:  4 total time:   42609222 ns average time:   426 ns (SpinLock)
jobs:  4 total time:   13734551 ns average time:   137 ns (Atomic)
jobs:  8 total time:  107958834 ns average time:  1079 ns (SpinLock)
jobs:  8 total time:   30681228 ns average time:   306 ns (Atomic)
jobs: 16 total time:  213277915 ns average time:  2132 ns (SpinLock)
jobs: 16 total time:   52189737 ns average time:   521 ns (Atomic)

Future Reading

C++ 11 in Action

C++ 11 is taking place in chromium.

In short, the below c++11 features allowed currently:

  • auto
  • range
  • static_assert
  • variadic templates
  • nullptr
  • override
  • final
  • or « in template arguments

  • stdint.h and inttypes
  • variadic macros

Visit chromium-cpp page for more.

Similar effort is made by mozilla

Compiler Support on Modern C++

Foreword

In my previous post , I listed the C++11 and C++14 status of some common compilers. Here I am coming to do a more practical and complete study on C++11 (and C++14 if possible).

Okay, Let’s begin

’ ## Compilers Notably, GCC C++, Clang C++, IBM C++, and Microsoft C++ already implement most or all C++11 features:

In additions, their belonging library:

  • libstdc++ status
  • libc++ status page is removed for the support is completed

C++11 Features

There are two parts making a complete C++11 support, namely compiler support and library one.

A detailed table for similar purpose inf different C++11 Features of c++ compiler is made at wiki.appache.org last updated at 2013, or visit cpprocks one

C++11 STL Support

  • libstdc++4.8 has incomplete and buggy std::regex, which is completed in 4.9.
  • libc++ has been completed for a quite a long time (about a year).

  • to be completed

Projects in C++11

  • CAF https://github.com/actor-framework
  • boost https://github.com/boostorg/boost

Projects coworks under both of C++03 and C++11

  • boost https://github.com/boostorg/boost
  • intel tbb

C++14 and its succeeder, C++17

C++14

A detailed table for similar purpose inf different C++14 Features of c++ compiler is made or visit italiancpp.org one, or visit cpprocks one

Although clang 3.4 has c++14 features complete, some c++14 users claims that clang 3.4 has full support for C++14, it has several C++14-related bugs that are fixed in 3.5. view more

C++17

clang has named experimental c++17 support to c++1z, and gcc follows

  • ISO Roadmap https://isocpp.org/std/status

Some words on c++11

you might wanna use -std=c++11 instead of -std=c++0x for a real C++11 support (gcc 4.8+, clang 3.3+ and etc)

you might wanna use -std=c++14 instead of -std=c++1y for a real C++14 support (gcc 5.0+, clang 3.5+ and etc)

TL’DR

For a complete c++11-feature support, please use:

  • gcc 4.8.1 or later, (for regex library please use gcc 4.9)
  • clang 3.3 or later
  • MSVC 2014 CTP1+ or latest
  • Inter C++ 15.0 or later

How to start up with a project

  1. Take a glimpse at READMEs, HACKINGs, documents and wikis
  2. Grep the bug list, and see if anything interests you.Also you can look for easy diagnostic bugs. Pick one or two, learn how to reproduce a bug, understand it and try to fix it.

If a bug has been assigned and you’re interested, I suggest contacting the assignee. They might be happy to offload the work and assist you with the fix.

  1. Read through header files, test cases and then source files

Thanks to the discussion of cfe-dev