shengli's blog

Tick the World

Fast logging system in embedded system

Posted at — Oct 23, 2017

Fast logging system in embedded system


[TOC]

Introduction

FLOG is a light-weight logging system which has features like, high throughput, non-blocking and lock-less write from different process, crash save in field. It is suitable for the embedded system using Linux system.

Log VS. Trace

Log was used as recording the activity on subsystem, it provide the activity information to the system administrator or operator to give them a preview on the activity. such as booting information when kernel boot, plug-in-plug-out event popup by kernel.

cache-01

Trace was used as recording detailed information on specified system, its main purpose is used as tracking the detail into this system, the output which it generate was used by programmer to analysis what is going on when some problem happened. In common occurrence, it only be requested when system was not working as expected, such as the throughput was not reached into the theory level in wireless card, etc.

Current status

For most products working in the embedded Linux, they always use syslogd or rsyslogd as their logging system, but actually, those daemon was designed to use as system logging such as events or activity generating from subsystem instead of recording log requiring high throughput such as tracing the detailed internal activity.

What it will be happened if use them?

If you use syslogd or rsyslogd as tracing work in local host, the process use them record the log via local file socket. even you redirect the log to the remote logging workstation, it use a UDP socket to send the log. If the tracing point was laid where the event happened on each per tti(1ms) kicked, then the following aspects you need to know:

cache-02

In one of wireless router project, the engineer use syslogd to dump the internal operations per TTI(1ms), about 5 min later, the PHY-MAC can’t meet the minimum requirement of sync, then the whole system will down sooner.

How flog aim at?

FLOG, want to address the problem above, and more than that:

Internal design

cache-03

no system call, user space operation

For simplify, FLOG try its best NOT use system call for better performance. The processes running in embedded system normally get signed by the publisher, and the binary get verified by the system during booting procedure, so we think all the binary running on the embedded system are reliable and safety. With this guarantee, we create a new character device named rdev which reserved a range of space from boot args.

bootargs:
root=/dev/mtdblock8 rw rootfstype=jffs2 mem=512M rram_size=8M ...

Considering of the characteristic of this device named rdevwhich has no hot spot write/read from this memory, after full evaluation, this range of memory was config with no any L1/L2 cache.

Then, during the booting time, we have to reserve this range of space with request_mem_region, and followed by ioremap_nocache to create the PG table for this device.

Device driver accept the configurable parameter for the size of this region, 8M as the default value.

A little trick was here, as normal operation, application in user space will use mmap map this region to its own space area, but it would be noisy if every process who use flog will do such job, and occupy some virtual address space for the same physical space range. For saving the memory address space in each process, the device driver of rdev reassign the PG table permission attribute with USER_ACCESS, then it means every process can directly write or read the contents with the same PG tables. As we mentioned above, security is not a problem here as we trust every process running on this system.

cache-03

dual ring buffer design

Flog try its best not introduce any lock schematic into design as the lock will occasionally dramatic decrease the whole performance, it rather select to discard some message if too much message was flooded into the flog.

Dual-Ring-Buffer design is the answer to this challenge.

Multi-Log-Producer use lock-less list to update the internal structure of the working BANK, which won’t bother each other.

flog_write(msg)
{
	// calculate the size of writing message.
	int n = snprintf(NULL, 0, ...);
	// get the current_bank from `segment header` and check the r/w lock bit whether or not busy updated by rlog_thread.
	//IF YES:
	//   Atomic switch to another bank, return back to the header in that bank;
	//ELSE:
	//    return back the header in the current bank;
	//END
	current_bank = find_free_bank();
	// it should consider the round-again situation, I don't express it here for simplify.
    atomic_set(current_bank_header, current_bank_header + align_address(n, 4));
    atomic_set(message_header_rwlock);
    snprintf(message_header_ptr, msg, ...);
    atomic_clr(message_header_rwlock);
    //Inform daemon process if the whole size exceed the config size. F.g. 2MB
    flog_update_bank_write_size();
}

elog demon process is the consumer process which dump the message in the raw into the disk, which was boot when whole system boot like syslogd. It read the configuration file under /etc/flog.conf directory. You can make it into effect with those parameters dynamically.

flog.dirty_background_ratio = 50  //50% means half message full reached
flog.dirty_interval_centisecs = 500  //ms
flog.rolling_file_size = 4 // Mega size

flog kernel module

This flog kernel module mainly change the permission of PG tables, in my init version, the flush task was assigned into the kthread started in kernel, but it has some drawbacks.

multi-core deployment

Multi-core processors are more popular now, it is preferred that the flog should be deployed with different core with another where the CPU-intensive task located. The final test result show in the multi-core env, the occurrence of discarding message have a great decreased.

Conclusion

In the wireless device, we often need to record the scheduler information in MAC per TTI(1ms), this requirement give us BIG challenge on the performance of logging system. Facts proves the legacy tools used for log like syslogd rsyslogd can’t meet this requirement.

After a period of evaluation, this flog make impressive performance to other solution, later i will give some comparison between flog and syslogd and rsyslogd.

comments powered by Disqus