博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
checksum 工具_Linux上哪种Checksum工具更快?
阅读量:2525 次
发布时间:2019-05-11

本文共 5252 字,大约阅读时间需要 17 分钟。

checksum 工具

It is common practice to calculate the checksums for to check its integrity. For large files, the checksum computation is slow. Now I am wondering why it is so slow and whether choosing another tool will be better. In this post, I try three common tools md5sum, sha1sum and crc32 to compute checksums on a relatively large file to see which checksum tool on is faster to help us decide the choices of the checksum tool.

通常的做法是计算的校验和以检查其完整性。 对于大文件,校验和计算很慢。 现在我想知道为什么它这么慢,以及选择其他工具是否会更好。 在本文中,我尝试使用三种常用的工具md5sumsha1sumcrc32来计算相对较大文件的校验和,以查看上哪种校验和工具更快,以帮助我们确定校验和工具的选择。

File to be checsum’ed is a 15GB text file:

要检查的文件是一个15GB的文本文件:

$ ls -lha wiki.txt -rw-r--r-- 1 zma zma 15G Jun 14 10:28 wiki.txt

表现 (The performance)

Now, let’s see how does the three tools perform for computing the checksum of the file.

现在,让我们看看这三个工具如何执行计算文件的校验和。

sha1sum速度 (sha1sum speed)

$ time sha1sum wiki.txt 251dcb5c08c6a2fabd258f2c8a9b95e15c0cc098  wiki.txtreal    1m21.143suser    0m21.647ssys 0m4.668s

crc32速度 (crc32 speed)

$ time crc32 wiki.txt0080f7a1real    1m21.051suser    0m16.194ssys 0m4.890s

md5sum速度 (md5sum speed)

$ time md5sum wiki.txte2e649030c795ffa9f33a99bcb39dde7  wiki.txtreal    1m27.392suser    0m25.563ssys 0m3.936s

摘要 (Summary)

From the results, crc32 is the fasted. But it is just a tiny bit faster than sha1sum and md5sum. md5sum is the slowest but just a little bit slower.

从结果来看, crc32是禁食的。 但这仅比sha1summd5sum快一点。 md5sum是最慢的,但稍微慢一点。

Why there is no much differences? To compute the checksums, the tools need to read these files and do the computation. Now, let’s check how much time is needed to read the file content out.

为什么没有太多差异? 要计算校验和,工具需要读取这些文件并进行计算。 现在,让我们检查一下读取文件内容需要多少时间。

$ time dd if=wiki.txt of=/dev/null bs=81921953039+1 records in1953039+1 records out15999296457 bytes (16 GB) copied, 80.4203 s, 199 MB/sreal    1m20.447suser    0m0.202ssys 0m7.091s

The I/O read speed is around 200MB/s. That’s not bad for a single magnetic disk I/O storage.

I / O读取速度约为200MB / s。 对于单个磁盘I / O存储来说,这还不错。

So, almost all time are on reading the file content. The algorithms and the tools themselves are not yet the limitation. The disk I/O speed is.

因此,几乎所有时间都在读取文件内容上。 算法和工具本身还不是限制。 磁盘I / O速度是。

The conclusion is that use any tools that work the best for you (you may need to be aware of the the collisions for these algorithms, check ) without worrying a lot about the speed (it still consumes time) on a relatively modern computer. If you want higher speed, improve your I/O speed first till CPU is the bottleneck (CPU usage reaches 100%).

结论是,使用任何最适合您的工具(您可能需要了解这些算法的冲突,请查看 ),而不必担心相对现代计算机上的速度(仍然会浪费时间) 。 如果要提高速度,请先提高I / O速度,直到CPU成为瓶颈(CPU使用率达到100%)。

如果I / O不是瓶颈怎么办 (What if I/O was not the bottleneck)

Pádraig that we can avoid the I/O and measure the computational cost. I did a little bit change to the suggested command to do checksum on a file under /dev/shm/ as crc32 does not accept input from STDIN. The system is the same one on which I did the previous tests. It can only support 3GB by the time I did this test. The results are as follows.

Pádraig 说,我们可以避免I / O并测量计算成本。 我对建议的命令做了一些更改,以便对/ dev / shm /下的文件执行校验和,因为crc32不接受来自STDIN的输入。 该系统与我之前进行测试的系统相同。 进行此测试时,它只能支持3GB。 结果如下。

[zma@host:/dev/shm]$ head -c 3G /dev/zero >test[zma@host:/dev/shm]$ for chk in crc32 md5sum sha1sum ; do echo $chk; time $chk test; donecrc32480bbe37real    0m3.411suser    0m2.931ssys     0m0.482smd5sumc698c87fb53058d493492b61f4c74189  testreal    0m5.103suser    0m4.697ssys     0m0.409ssha1sum6e7f6dca8def40df0b21f58e11c1a41c3e000285  testreal    0m4.451suser    0m4.082ssys     0m0.372s

To summarize the speed if we consider md5sum‘s speed as the baseline:

如果将md5sum的速度作为基线,则总结速度:

md5sum: 1.00x

crc32: 1.50x
sha1sum: 1.15x

md5sum :1.00x

crc32 :1.50x
sha1sum :1.15倍

crc32 is the fastest here. It is a Perl 5 program using Archive::Zip::computeCRC32() to compute the crc32.

crc32是这里最快的。 这是一个Perl 5程序,使用Archive::Zip::computeCRC32()计算crc32。

The throughput here for md5sum is above 600MB/s. This is not a number that can not be achieved by an SSD or a RAID of SSDs. On the system I tested, if the I/O is much improved, the computation will likely affect much of the time spent.

md5sum的吞吐量在600MB / s以上。 这不是SSD或RAID的RAID无法达到的数字。 在我测试的系统上,如果I / O得到很大改善,则计算可能会影响所花费的大部分时间。

CPU型号和使用的校验和工具版本 (CPU model and versions of checksum tools used)

Here are the CPU model and versions of the checksum tools used during the test.

这是测试期间使用的CPU型号和校验和工具的版本。

$ lscpu | grep "Model name"Model name:            Intel(R) Core(TM) i5-4460  CPU @ 3.20GHz
$ md5sum --versionmd5sum (GNU coreutils) 8.23Copyright (C) 2014 Free Foundation, Inc.License GPLv3+: GNU GPL version 3 or later 
.This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Written by Ulrich Drepper, Scott Miller, and David Madore.$ sha1sum --versionsha1sum (GNU coreutils) 8.23Copyright (C) 2014 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later
.This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Written by Ulrich Drepper, Scott Miller, and David Madore.$ rpm -qf `which crc32`perl-Archive-Zip-1.46-1.fc22.noarch

翻译自:

checksum 工具

转载地址:http://tqowd.baihongyu.com/

你可能感兴趣的文章
四. k8s--pod控制器
查看>>
一. python数据结构与算法
查看>>
django模型内部类meta解释
查看>>
v-for(:key)绑定index、id、key的区别
查看>>
el-tree文本内容过多显示不完全问题(解决)
查看>>
el-table翻页序号不从1开始(已解决)
查看>>
vue-cil 打包爬坑(解决)
查看>>
定位问题 vue+element-ui+easyui(兼容性)
查看>>
四叶草(css)
查看>>
nginx——前端服务环境
查看>>
vue+element-ui 字体自适应不同屏幕
查看>>
Vue 循环为选中的li列表添加效果
查看>>
vue创建脚手架 cil
查看>>
ArcGIS分支版本化( Branch Versioning )技术介绍
查看>>
scrapy过滤重复数据和增量爬取
查看>>
scrapy-redis源码浅析
查看>>
tupian
查看>>
selenium定位非select下拉框的元素 ,定位不到
查看>>
用elasticsearch分析中国大学省份分布
查看>>
elasticsearch 常用查询 + 删除索引
查看>>