Windyland serving my blog

Why writing software is hard.

As DHH talks in his blog, writing software is hard, which I can’t agree more.

First of all, I declaim that I am at my 28. Obviously it is not the age of a senior software engineer. but I am one of the people who uses software very early, i.e. at 10 in 1999.

Coding is hard

The most thing I want to talk about is coding. DHH makes it clear that no super-duper easiness will help you about coding. Problems get solved by using new coding technique such as new languages, frameworks or libs, but not all.

Coding is still hard. Writing good code is not simply applying cool technique or some super cool “best practices”.

If the code doesn’t solve the users’ problems, it means nothing.

If the code causes more problems after merged, it means nothing.

If the code breaks the system it belongs to, it means nothing.

You really want to make a good piece of coding. You learn many different things, compute science theories, algorithms, language rules, different libraries, other programmers’ code and many others. But it won’t make you great if your code breaks things.

Coding is something about the language

Even if you make hard to learn many techniques and apply them into your code, it won’t get you right. So there is something between your code and good code.

Let’s think a bit further. You write your code in computer languages, and you think it is the language used for you and computers only. But you are WRONG.

In fact, coding is not just coding for the computer. The final target you write to and compute with is not the computer, but the human being. I am not talking about the customers, but the involvees, including you and your team.

Maybe you are confused about this. Let me make it simpler. When you begin to write the code and get it shipped to customers, your team designs the product, writes the code, reviews and verify it before deliver. And your customers use your product and feed back about it. You all get involved. Ideas and thoughts are communicated via the software, or simpler, the code.

The Final Goal

So that’s what the computer stands for, an invention to compute the human being.

PS: Happy New Year!

The Null Block Device

Linux Kernel provides null_blk module since 3.13, which can be used for benchmarking various block-layer implementations. Also it can be used as a dummy device to diagnostic storage system issues.

Check your kernel has null_blk module

  • grep CONFIG_BLK_DEV_NULL /boot/config-$(uname -r)
  • or grep CONFIG_BLK_DEV_NULL /proc/config

for example centos 7 has:


Use Null blk

the usage is much like zram, just insmod this module:

insmod null_blk

you can find two 250 GB-sized block device added

nullb0      251:0    0   250G  0 disk
nullb1      251:1    0   250G  0 disk

and run some benchmark on them:

  • dd if=/dev/zero of=/dev/nullb0 bs=4k oflag=direct
  • dd if=/dev/nullb0 of=/dev/null bs=4k iflag=direct
  • hdparm -tT /dev/nullb0
  • aio-stress -O -s 64m -r 256k -i 1024 -b 1024 /dev/nullb0
  • fio --ioengine=libaio --readwrite=randread --bs=4k --filename /dev/nullb0 --name journal_test --thread --norandommap --numjobs=200 --iodepth=64 --runtime=30 --time_based --group_reporting
  • fio --ioengine=libaio --readwrite=randwrite --bs=4k --filename /dev/nullb0 --name journal_test --thread --norandommap --numjobs=200 --iodepth=64 --runtime=30 --time_based --group_reporting

if you turn on scsi_mq, you will find incredible iops on your fio benchmarks, such as 5M iops on randread on my machine.

Advanced Usage

null_blk module has many parameters provided, use can tune:

  • submit_queues: the number of submission queues
  • gb: size in GiB
  • bs: block size in bytes
  • nr_devices: number of devices to register
  • completion_nsec: time in ns to complete a request in hardware
  • hw_queue_depth: queue depth for each hardware queue

for the case for debian backported 4.6.0 kernel:

sudo modinfo null_blk
filename: /lib/modules/4.6.0-0.bpo.1-amd64/kernel/drivers/block/null_blk.ko
license:        GPL
author:         Jens Axboe <>
intree:         Y
vermagic:       4.6.0-0.bpo.1-amd64 SMP mod_unload modversions
parm:           submit_queues:Number of submission queues (int)
parm:           home_node:Home node for the device (int)
parm:           queue_mode:Block interface to use (0=bio,1=rq,2=multiqueue)
parm:           gb:Size in GB (int)
parm:           bs:Block size (in bytes) (int)
parm:           nr_devices:Number of devices to register (int)
parm:           use_lightnvm:Register as a LightNVM device (bool)
parm:           irqmode:IRQ completion handler. 0-none, 1-softirq, 2-timer
parm:           completion_nsec:Time in ns to complete a request in hardware.  Default: 10,000ns (ulong)
parm:           hw_queue_depth:Queue depth for each hardware queue. Default: 64
parm:           use_per_node_hctx:Use per-node allocation for hardware context queues. Default: false (bool)

there parameters are also described in null blk kernel documentation


Some notes on atomic operations

Atomic operations are quite useful in concurrent programming, notably in the implementations of lock-free algorithms. Many times when the locking algorithms goes to the performance bottleneck, atomic operations come to save.

Atomic ordering

  • NotAtomic (regular load and store)
  • Unordered (to match java safe language memory model)
  • Monotinic (or memory_order_relaxed)
  • Acquire (or memory_order_acquire and memory_order_consume)
  • Release (or memory_order_release)
  • AcuireRelease (or memory_oder_acq_rel)
  • SquentiallyConsistent (or memory_order_seq_cst)

Platforms implementations


all atomic loads generate a MOV and SquentiallyConsistent stores generate an XCHG, other stores generate a MOV.

on ARM (before v8) and MIPS

Acquire, Release and SquentiallyConsistent requires barrier instructions for every such operation. Loads and stores generate normal instructions.

Language Standard and Compiler, library implementations

  • The new C++11 atomic header and C11 stdatomic.h header
  • Java-style volatile variables (match SquentiallyConsistent)
  • gcc __sync_* builtins (match SquentiallyConsistent)

if you want to use atomic operations before sticking to the new standards, there are compiler builtins functions and libraries availabe to call proper asm instructions to do these atomic operations.

gcc atomic operations Built-in Functions:


A Small Benchmark

I wrote a small program to benchmark the performance of atomic operations against mutexes and spinlocks. It is hosted at GitHub.. You can clone the repository and execute make. It requires a modern c++ compiler with c++11 support.

this sample output is on my 4-cores 2013-late Macbook Pro:

jobs:  1 total time:  614932094 ns average time:  6149 ns (Mutex)
jobs:  1 total time:    5132226 ns average time:    51 ns (SpinLock)
jobs:  1 total time:    4172785 ns average time:    41 ns (Atomic)
jobs:  2 total time:  746401303 ns average time:  7464 ns (Mutex)
jobs:  2 total time:   15983439 ns average time:   159 ns (SpinLock)
jobs:  2 total time:    9356120 ns average time:    93 ns (Atomic)
jobs:  4 total time:   42609222 ns average time:   426 ns (SpinLock)
jobs:  4 total time:   13734551 ns average time:   137 ns (Atomic)
jobs:  8 total time:  107958834 ns average time:  1079 ns (SpinLock)
jobs:  8 total time:   30681228 ns average time:   306 ns (Atomic)
jobs: 16 total time:  213277915 ns average time:  2132 ns (SpinLock)
jobs: 16 total time:   52189737 ns average time:   521 ns (Atomic)

Future Reading

C++ 11 in Action

C++ 11 is taking place in chromium.

In short, the below c++11 features allowed currently:

  • auto
  • range
  • static_assert
  • variadic templates
  • nullptr
  • override
  • final
  • or « in template arguments

  • stdint.h and inttypes
  • variadic macros

Visit chromium-cpp page for more.

Similar effort is made by mozilla

Compiler Support on Modern C++


In my previous post , I listed the C++11 and C++14 status of some common compilers. Here I am coming to do a more practical and complete study on C++11 (and C++14 if possible).

Okay, Let’s begin

’ ## Compilers Notably, GCC C++, Clang C++, IBM C++, and Microsoft C++ already implement most or all C++11 features:

In additions, their belonging library:

  • libstdc++ status
  • libc++ status page is removed for the support is completed

C++11 Features

There are two parts making a complete C++11 support, namely compiler support and library one.

A detailed table for similar purpose inf different C++11 Features of c++ compiler is made at last updated at 2013, or visit cpprocks one

C++11 STL Support

  • libstdc++4.8 has incomplete and buggy std::regex, which is completed in 4.9.
  • libc++ has been completed for a quite a long time (about a year).

  • to be completed

Projects in C++11

  • CAF
  • boost

Projects coworks under both of C++03 and C++11

  • boost
  • intel tbb

C++14 and its succeeder, C++17


A detailed table for similar purpose inf different C++14 Features of c++ compiler is made or visit one, or visit cpprocks one

Although clang 3.4 has c++14 features complete, some c++14 users claims that clang 3.4 has full support for C++14, it has several C++14-related bugs that are fixed in 3.5. view more


clang has named experimental c++17 support to c++1z, and gcc follows

  • ISO Roadmap

Some words on c++11

you might wanna use -std=c++11 instead of -std=c++0x for a real C++11 support (gcc 4.8+, clang 3.3+ and etc)

you might wanna use -std=c++14 instead of -std=c++1y for a real C++14 support (gcc 5.0+, clang 3.5+ and etc)


For a complete c++11-feature support, please use:

  • gcc 4.8.1 or later, (for regex library please use gcc 4.9)
  • clang 3.3 or later
  • MSVC 2014 CTP1+ or latest
  • Inter C++ 15.0 or later