FunnyMagic&Software: bcache file system 紀錄

今天又逛了一下linux maillist 發現了一篇還蠻有趣的發言，筆記一下看看未來的發展。
是一篇關於一個新的檔案系統，可能是作者(曾經在Google服務過)發的文。
內容提到bcachefs就他文中描述，bcahefs 的效能以及可靠度可以跟ext4, xfs 相提並論又有btrfs/zfs的特性，之後有時間應該要去了解一下這個檔案系統。

回到主題
bcachefs is a modern COW(copy-on-write?) filesystem with checksumming, compression, multiple devices, caching,
and eventually snapshots and all kinds of other nifty features.

再來直接看看發文者對bcachefs的特性描述如下

FEATURES:
- multiple devices
(replication is like 80% done, but the recovery code still needs to be
finished).

- caching/tiering (naturally)
you can format multiple devices at the same time with bcacheadm, and assign
them to different tiers - right now only two tiers are supported, tier 0
(default) is the fast tier and tier 1 is the slow tier. It'll effectively do
writeback caching between tiers.

- checksumming, compression: currently only zlib is supported for compression,
and for checksumming there's crc32c and a 64 bit checksum. There's mount
options for them:
# mount -o data_checksum=crc32c,compression=gzip

Caveat: don't try to use tiering and checksumming or compression at the same
time yet, the read path needs to be reworked to handle both at the same time.

PLANNED FEATURES:
- snapshots (might start on this soon)
- erasure coding
- native support for SMR drives, raw flash

發文者對此檔案系統也作了一些效能測試，使用的是dbench，高端的pcie flash device

Here's some dbench numbers, running on a high end pcie flash device:

1 thread, O_SYNC: Throughput Max latency
bcache: 225.812 MB/sec 18.103 ms
ext4: 454.546 MB/sec 6.288 ms
xfs: 268.81 MB/sec 1.094 ms
btrfs: 271.065 MB/sec 74.266 ms

20 threads, O_SYNC: Throughput Max latency
bcache: 1050.03 MB/sec 6.614 ms
ext4: 2867.16 MB/sec 4.128 ms
xfs: 3051.55 MB/sec 10.004 ms
btrfs: 665.995 MB/sec 1640.045 ms

60 threads, O_SYNC: Throughput Max latency
bcache: 2143.45 MB/sec 15.315 ms
ext4: 2944.02 MB/sec 9.547 ms
xfs: 2862.54 MB/sec 14.323 ms
btrfs: 501.248 MB/sec 8470.539 ms

1 thread: Throughput Max latency
bcache: 992.008 MB/sec 2.379 ms
ext4: 974.282 MB/sec 0.527 ms
xfs: 715.219 MB/sec 0.527 ms
btrfs: 647.825 MB/sec 108.983 ms

20 threads: Throughput Max latency
bcache: 3270.8 MB/sec 16.075 ms
ext4: 4879.15 MB/sec 11.098 ms
xfs: 4904.26 MB/sec 20.290 ms
btrfs: 647.232 MB/sec 2679.483 ms

60 threads: Throughput Max latency
bcache: 4644.24 MB/sec 130.980 ms
ext4: 4405.16 MB/sec 69.741 ms
xfs: 4413.93 MB/sec 131.194 ms
btrfs: 803.926 MB/sec 12367.850 ms

文中提到幾項自己有興趣的部分，
"Where'd that 20% of my space go?" - you'll notice the capacity shown by df is
lower than what it should be. Allocation in bcachefs (like in upstream bcache)
is done in terms of buckets, with copygc required if no buckets are empty, hence
we need copygc and a copygc reserve (much like the way SSD FTLs work).

It's quite conceivable at some point we'll add another allocator that doesn't
work in terms of buckets and doesn't require copygc (possibly for rotating
disks), but for a COW filesystem there are real advantages to doing it this way.
So for now just be aware - and the 20% reserve is probably excessive, at some
point I'll add a way to change it.

意思是使用者會發現容量被吃了!? 這對儲存業者來說好像蠻重視的??
一般使用者最敏感的也就是自己花的錢買到多少容量XD
作者提到目前的20%保留區，未來可能會提供一個方法去修改保留區的大小!?

Mount times:
bcachefs is partially garbage collection based - we don't persist allocation
information. We no longer require doing mark and sweep at runtime to reclaim
space, but we do have to walk the extents btree when mounting to find out what's
free and what isn't.

(We do retain the ability to do a mark and sweep while the filesystem is in use
though - i.e. we have the ability to do a large chunk of what fsck does at
runtime).

關於這邊的特性，提到bachefs對於可利用的空間不需要再執行時，使用mark以及sweep來作，而是在mount時藉由搜尋btree來得知。

底下有個應該是在HGST服務的工程師詢問他的問題，
How do you imagine SMR drives support? How do you feel about libzbc
using for SMR drives support? I am not very familiar with bcachefs
architecture yet. But I suppose that maybe libzbc model can be useful
for SMR drives support on bcachefs side. Anyway, it makes sense to
discuss proper model.

另一個引起我興趣的問題是
How do you imagine raw flash support in bcachefs architecture? Frankly
speaking, I am implementing NAND flash oriented file system. But this
project is proprietary yet and I can't share any details. However,
currently, I've implemented NAND flash related approaches in my file
system only. So, maybe, it make sense to consider some joint variant of
bcachefs and implementation on my side for NAND flash support. I need to
be more familiar with bcachefs architecture for such decision. But,
unfortunately, I suspect that it can be not so easy to support raw flash
for bcachefs. Of course, I can be wrong.

提到他正在實作NAND flash oriented file system從他詢問的方式來看應該再開發一種不需要FLASH controller的檔案系統，其實我也一直在思考何時會不需要FLASH controller?? 撇開最近剛發表很紅的XPOINT不說。未來產品是否會回歸直接搭載raw flash? 產品效能的瓶頸在哪?

FunnyMagic&Software

2015年8月26日星期三

bcache file system 紀錄

沒有留言:

張貼留言

2015年8月26日 星期三

bcache file system 紀錄

沒有留言:

張貼留言

2015年8月26日星期三