shengli's blog

Tick the World

Efficient data transfer through zero copy

Posted at — Jun 5, 2017

Zero-copy I/O practice in project

Recently weeks, I was working on how to exploit the potentialities with the GMAC(1G Ethernet Media Access Controller) on the SoC in my desktop. This SoC was used in the telecommunication field which provide the fast-transfer-data capability with some interesting MAC features. In this article, I had like share some experiences on how to utilize this MAC achieving better network throughput during this work.

Preface

@[Blog] The following articles were worth having a look before going forward.

After reading these articles, then you will know how the Java.nio.channels works. If you have more time, you can dig more on the Kafaka, the most popular message queue framework written by LinkedIn.

So, let’s first look at the capability of PHY/MAC in my hand.

The PHY was provided by Atheros AR8031 as Ethernet transceiver and Atheros AR8327 as Ethernet switch.

Some advanced features which attract on me were:

Do it

In this embedded project, it was not allowed 3rd organization/person install/uninstall any package without authorization, which was guaranteed by the Trust Zone Area, it means any installed package after HASH self-signed must be verified by the public key install on this area, otherwise it will failed to get install or upgrade.

What I want to said was - we trust the software installed on the board. Then it means we don’t need to consider the security issues mentioned in the 2nd article. Then we can simplify our design without considering security.

Some highlight feature include:

Some aspects need to consider in front of project are:

These limitations/requirements decide the way what we do.

In purpose of better memory management between kernel space and user space, we setup a raw virtual device with fixed I/O start address and length when start kernel, then both user space application which can use mmap() mapping the I/O space to its own virtual address and kernel module can access the same memory address via the fixed offset.

There are some inherent access attributes on those memory, so it is necessary to have a test on which cache scheme was best suit for our requirement.

cache-02

For achieve the best system efficiency:

So, let’s have a look at this picture show:
cache-02

Design details

End

Actually, in our products, other module/process also use this POOL as hot-fast data storage, such like we develop the fsyslog which was an log framework providing the functions like syslogd(…) in Unix but it provide shared-memory based logging scheme with faster and non-blocking operation when logging in user application, etc.

Later I will show the comparison after this changes, which achieving more than 30% throughput boost in whole and 2% load decrease in our specified product.

comments powered by Disqus