1 Go 语言性能测试
写性能测试在 Go 语言中是很便捷的,go 自带的标准工具链就有完善的支持。
1.1 benchmark
写 benchmark 测试有如下约定:
- benchmark 也是测试,因此也是以
_test.go
结尾的文件; - 需要
import testing
; - 测试方法以
Benchmark
开始,并且拥有一个*testing.B
参数。
*testing.B
参数提供了大多数和*testing.T
类似的方法,同时也额外增加了一些性能测试相关的方法。此外,还提供了一个整型成员N
,用来指定被检测操作的执行次数。
举个例子,下边是一个连接字符串的操作:
bench_test.go
:
<code class="hljs go has-numbering"><span class="hljs-keyword">package</span> main <span class="hljs-keyword">import</span> <span class="hljs-string">"testing"</span> <span class="hljs-keyword">import</span> <span class="hljs-string">"strings"</span> <span class="hljs-keyword">func</span> BenchmarkStringJoin1(b *testing.B) { input := []<span class="hljs-typename">string</span>{<span class="hljs-string">"Hello"</span>, <span class="hljs-string">"World"</span>} <span class="hljs-keyword">for</span> i :=<span class="hljs-number"> 0</span>; i < b.N; i++ { result := strings.Join(input, <span class="hljs-string">" "</span>) <span class="hljs-keyword">if</span> result != <span class="hljs-string">"Hello World"</span> { b.Error(<span class="hljs-string">"Unexpected result: "</span> + result) } } }</code>
然后执行 benchmark 测试:
<code class="hljs lasso has-numbering">$ go test <span class="hljs-attribute">-bench</span><span class="hljs-subst">=</span><span class="hljs-built_in">.</span> <span class="hljs-attribute">-run</span><span class="hljs-subst">=</span><span class="hljs-literal">NONE</span></code>
-bench
可以根据正则表达式筛选要测试的方法,这里的.
表示匹配所有的测试方法;-run=NONE
表示忽略功能测试。
输出如下:
<code class="hljs avrasm has-numbering"><span class="hljs-label">goos:</span> linux <span class="hljs-label">goarch:</span> amd64 <span class="hljs-label">pkg:</span> hello/bench_test BenchmarkStringJoin1-<span class="hljs-number">8</span> <span class="hljs-number">50000000</span> <span class="hljs-number">36.6</span> ns/op PASS ok hello/bench_test <span class="hljs-number">1.871</span>s</code>
BenchmarkStringJoin1-8
表示测试了哪个方法,8
标识GOMAXPROCS
的值,通常等于 CPU 物理线程数;50000000
即进行了 50000000 次测试,基准测试运行器开始并不知道这个操作耗时长短,所以开始的时候它使用一个比较小的 N 值,然后根据运行情况推算出合适的 N 值;36.6 ns/op
表示每个操作耗时 36.6 纳秒。
如果希望报告更多信息,可以通过增加参数实现,如通过如下命令展示内存分配信息:
<code class="hljs avrasm has-numbering">$ go test -bench=. -benchmem <span class="hljs-label">goos:</span> linux <span class="hljs-label">goarch:</span> amd64 <span class="hljs-label">pkg:</span> hello/bench_test BenchmarkStringJoin1-<span class="hljs-number">8</span> <span class="hljs-number">30000000</span> <span class="hljs-number">37.2</span> ns/op <span class="hljs-number">16</span> B/op <span class="hljs-number">1</span> allocs/op PASS ok hello/bench_test <span class="hljs-number">1.159</span>s </code>
16 B/op
表示每次操作需要 16 字节的内存;1 allocs/op
表示每次操作会进行一次内存分配。
当然,也可以在测试代码中指定(这样就不需要-benchmem
参数了):
<code class="hljs livecodeserver has-numbering">func BenchmarkStringJoin1(b *testing.B) { b.ReportAllocs() <span class="hljs-comment"> // *testing.B 的 ReportAllocs 方法指定报告内存分配信息</span> input := []<span class="hljs-keyword">string</span>{<span class="hljs-string">"Hello"</span>, <span class="hljs-string">"World"</span>} <span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i < b.N; i++ { <span class="hljs-built_in">result</span> := strings.Join(input, <span class="hljs-string">" "</span>) <span class="hljs-keyword">if</span> <span class="hljs-built_in">result</span> != <span class="hljs-string">"Hello World"</span> { b.Error(<span class="hljs-string">"Unexpected result: "</span> + <span class="hljs-built_in">result</span>) } } } </code>
对于以上测试方法,优化如下(增加如下方法):
<code class="hljs applescript has-numbering">func BenchmarkStringJoin2(b *testing.B) { b.ReportAllocs() input := []<span class="hljs-type">string</span>{<span class="hljs-string">"Hello"</span>, <span class="hljs-string">"World"</span>} join := func(strs []<span class="hljs-type">string</span>, delim <span class="hljs-type">string</span>) <span class="hljs-type">string</span> { <span class="hljs-keyword">if</span> len(strs) == <span class="hljs-number">2</span> { <span class="hljs-command"> return</span> strs[<span class="hljs-number">0</span>] + delim + strs[<span class="hljs-number">1</span>]; } <span class="hljs-command"> return</span> <span class="hljs-string">""</span>; }; <span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i < b.N; i++ { <span class="hljs-constant">result</span> := join(input, <span class="hljs-string">" "</span>) <span class="hljs-keyword">if</span> <span class="hljs-constant">result</span> != <span class="hljs-string">"Hello World"</span> { b.Error(<span class="hljs-string">"Unexpected result: "</span> + <span class="hljs-constant">result</span>) } } }</code>
<code class="hljs avrasm has-numbering">$ go test -bench=. <span class="hljs-label">goos:</span> linux <span class="hljs-label">goarch:</span> amd64 <span class="hljs-label">pkg:</span> hello/bench_test BenchmarkStringJoin1-<span class="hljs-number">8</span> <span class="hljs-number">50000000</span> <span class="hljs-number">36.5</span> ns/op <span class="hljs-number">16</span> B/op <span class="hljs-number">1</span> allocs/op BenchmarkStringJoin2-<span class="hljs-number">8</span> <span class="hljs-number">100000000</span> <span class="hljs-number">20.2</span> ns/op <span class="hljs-number">0</span> B/op <span class="hljs-number">0</span> allocs/op PASS ok hello/bench_test <span class="hljs-number">3.909</span>s</code>
可见,优化后性能有较大提升,内存使用成本也有明显降低。
1.2 profiling
更多时候,我们不仅希望了解执行时长和内存占用情况,而是希望进行性能剖析,寻找关键代码优化点。Go 语言使用 pprof 工具来进行性能剖析。
Go 通过对执行过程进行采样,来获取 profiling 数据。具体来说支持如下几种性能指标:
- CPU profiling:用于识别出执行过程中需要 CPU 最多的函数。在每个 CPU 上面执行的线程每个几毫秒进行定期的中断,并在每次中断过程中记录一个性能剖析事件,然后恢复执行。
- heap profiling:用于识别出负责分配最多内存的语句。
- block profiling:用于识别出阻塞协程最久的操作,比如系统调用、通道发送、接收数据和获取锁等,性能分析库会在 goroutine 被这些操作阻塞的时候记录一个事件。
命令如下:
<code class="hljs vhdl has-numbering">$ go test -bench . -cpuprofile=cpu.<span class="hljs-keyword">out</span> -blockprofile=<span class="hljs-keyword">block</span>.<span class="hljs-keyword">out</span> -memprofile=mem.<span class="hljs-keyword">out</span></code>
这个时候该 pprof 出马了:
<code class="hljs r has-numbering">$ go tool pprof -text ./bench_test.test cpu.out File: bench_test.test Type: cpu Time: Jul <span class="hljs-number">26</span>, <span class="hljs-number">2018</span> at <span class="hljs-number">10</span>:17pm (CST) Duration: <span class="hljs-number">4.</span>10s, Total samples = <span class="hljs-number">3.</span>94s (<span class="hljs-number">96.01</span>%) Showing nodes accounting <span class="hljs-keyword">for</span> <span class="hljs-number">3.</span>85s, <span class="hljs-number">97.72</span>% of <span class="hljs-number">3.</span>94s total Dropped <span class="hljs-number">40</span> nodes (cum <= <span class="hljs-number">0.</span>02s) flat flat% sum% cum cum% <span class="hljs-number">1.</span>63s <span class="hljs-number">41.37</span>% <span class="hljs-number">41.37</span>% <span class="hljs-number">2.</span>93s <span class="hljs-number">74.37</span>% runtime.concatstrings <span class="hljs-number">0.</span>55s <span class="hljs-number">13.96</span>% <span class="hljs-number">55.33</span>% <span class="hljs-number">0.</span>68s <span class="hljs-number">17.26</span>% runtime.mallocgc <span class="hljs-number">0.</span>34s <span class="hljs-number">8.63</span>% <span class="hljs-number">63.96</span>% <span class="hljs-number">0.</span>34s <span class="hljs-number">8.63</span>% runtime.memmove <span class="hljs-number">0.</span>33s <span class="hljs-number">8.38</span>% <span class="hljs-number">72.34</span>% <span class="hljs-number">2.</span>01s <span class="hljs-number">51.02</span>% hello/bench_test.BenchmarkStringJoin2 <span class="hljs-number">0.</span>24s <span class="hljs-number">6.09</span>% <span class="hljs-number">78.43</span>% <span class="hljs-number">3.</span>17s <span class="hljs-number">80.46</span>% runtime.concatstring3 <span class="hljs-number">0.</span>23s <span class="hljs-number">5.84</span>% <span class="hljs-number">84.26</span>% <span class="hljs-number">0.</span>96s <span class="hljs-number">24.37</span>% runtime.rawstringtmp <span class="hljs-number">0.</span>21s <span class="hljs-number">5.33</span>% <span class="hljs-number">89.59</span>% <span class="hljs-number">1.</span>82s <span class="hljs-number">46.19</span>% hello/bench_test.BenchmarkStringJoin1 <span class="hljs-number">0.</span>12s <span class="hljs-number">3.05</span>% <span class="hljs-number">92.64</span>% <span class="hljs-number">1.</span>61s <span class="hljs-number">40.86</span>% strings.Join <span class="hljs-number">0.</span>05s <span class="hljs-number">1.27</span>% <span class="hljs-number">93.91</span>% <span class="hljs-number">0.</span>05s <span class="hljs-number">1.27</span>% runtime.memclrNoHeapPointers <span class="hljs-number">0.</span>05s <span class="hljs-number">1.27</span>% <span class="hljs-number">95.18</span>% <span class="hljs-number">0.</span>73s <span class="hljs-number">18.53</span>% runtime.rawstring <span class="hljs-number">0.</span>02s <span class="hljs-number">0.51</span>% <span class="hljs-number">95.69</span>% <span class="hljs-number">0.</span>02s <span class="hljs-number">0.51</span>% runtime.(*mspan).nextFreeIndex <span class="hljs-keyword">...</span> <span class="hljs-keyword">...</span></code>
可以看到占用 CPU 时间最多的函数。
还可以输出为图片格式(不过需要先安装 GraphViz,如sudo apt install graphviz
):
<code class="hljs avrasm has-numbering">go tool pprof -svg ./bench_test<span class="hljs-preprocessor">.test</span> cpu<span class="hljs-preprocessor">.out</span> > cpu<span class="hljs-preprocessor">.svg</span></code>
此外,还可以使用uber/go-torch
这个库生成火焰图,这个库使用了 FlameGraph。准备工作如下:
<code class="hljs ruby has-numbering">git clone --depth=<span class="hljs-number">1</span> <span class="hljs-symbol">https:</span>/<span class="hljs-regexp">/github.com/brendangregg</span><span class="hljs-regexp">/FlameGraph.git ~/</span>.flamegraph export <span class="hljs-constant">PATH</span>=<span class="hljs-variable">$PATH</span><span class="hljs-symbol">:~/</span>.flamegraph go get github.com/uber/go-torch</code>
然后生成火焰图:
<code class="hljs avrasm has-numbering">go-torch -b cpu<span class="hljs-preprocessor">.out</span> -f cpu<span class="hljs-preprocessor">.torch</span><span class="hljs-preprocessor">.svg</span></code>
2 fabric 性能剖析
关于 fabric 性能剖析可以参考该文档:https://github.com/hyperledger-archives/fabric/wiki/Profiling-the-Hyperledger-Fabric#other-profiling
fabric 性能剖析基于net/http/pprof包,使用这个包可以让 pprof 运行在一个 web 接口上。
fabric 的 peer 内置有 profile server,默认时运行在 6060 端口上的,并且默认关闭。可以通过将/etc/hyperledger/fabric/core.yaml
中的peer.profile.enabled
设置为true
来启用,或者设置环境变量CORE_PEER_PROFILE_ENABLED=true
。
这里我们借助caliper项目进行测试。
在 caliper 项目的测试环境的 docker-compose.yaml 文件中配置环境变量CORE_PEER_PROFILE_ENABLED=true
。仍然以smallbank
的测试为例,则在network/fabric/simplenetwork/docker-compose.yaml下增加一个 environment 变量。
然后启动性能测试,在 Caliper 项目根目录下执行:
<code class="hljs avrasm has-numbering">node benchmark/smallbank/main<span class="hljs-preprocessor">.js</span></code>
在环境准备好之后,测试开始之初,开启另一个终端,查看一下 peer 节点的 IP(如docker inspect $(docker ps | grep " peer0.org1.example.com" | awk '{print $1}')
命令),然后执行如下命令,收集 profiling 数据:
<code class="hljs ruby has-numbering"><span class="hljs-comment"># 默认采集 30 秒数据</span> go tool pprof <span class="hljs-symbol">http:</span>/<span class="hljs-regexp">/<peer-ip>:6060/debug</span><span class="hljs-regexp">/pprof/profile</span> <span class="hljs-comment"># 定义数据采集时长</span> go tool pprof <span class="hljs-symbol">http:</span>/<span class="hljs-regexp">/<peer-ip>:6060/debug</span><span class="hljs-regexp">/pprof/profile</span></code>
这个命令会默认采集 30 秒的数据,并进入交互模式,采集的数据默认的会在~/pprof
下创建名字是pprof.XXX.samples.cpu.NNN.pb.gz
的文件。
在交互模式下,分别键入web
指令、svg
指令、top10
指令等,可以生成报告或图片。
然后生成火焰图,更加直观:
<code class="hljs avrasm has-numbering">go-torch -b pprof<span class="hljs-preprocessor">.peer</span><span class="hljs-preprocessor">.samples</span><span class="hljs-preprocessor">.cpu</span><span class="hljs-number">.001</span><span class="hljs-preprocessor">.pb</span><span class="hljs-preprocessor">.gz</span> -f peer<span class="hljs-preprocessor">.cpu</span><span class="hljs-preprocessor">.torch</span><span class="hljs-preprocessor">.svg</span></code>
可以发现,CPU 大部分的工作都是在进行加解密和校验,如椭圆曲线算法。