• -------------------------------------------------------------
  • ====================================

RAFT orderers SERVICE_UNAVAILABLE

区块链 dewbay 3年前 (2022-01-12) 1793次浏览 已收录 0个评论 扫描二维码

Details

  • Type: Bug
  • Fix Version/s:v2.0.0
  • Component/s:fabric-orderer
  • Labels:

    • Steps to Reproduce:

      1. Create a blockchain network with 3 raft orderers
      2. We created 100 channels, but only used 3 for the traffic: testorgchannel1-testorgchannel3. All channels has the same 3 org, 1 peer per org.
      3. Using 3*m (m=1 … 10) threads per channel with 3 channels, total thread number is 9*m, each thread targets 1 orderer in the fashion of roundrobin, i.e., thread 1 to orderer1, thread 2 to orderer 2, etc.In our example we used PTE in Constant Mode (to send the next transaction as soon as the TX response ACK for the previous one was received. We increase number of channels. Each thread sends 10,000 transactions.
      batchsize = 500 ; batchtimeout=2s; SnapshotInterval: 100 MB
      Two orderers show “SERVICE_UNAVAILABLE” after sending 3000 transactions in the case with 3 channels, 6 thread each (total 18 threads).
      Note that the test with 3 channels, 3 thread each (total 9 threads) succeeds.

    Description

    The orderers show no Raft leader, see attached logs (orderer2nd-ordererorg-7.log and orderer3rd-ordererorg-7.log) and PTE log (FAB-13760-ch3-th2-move-0213170638.log)

    [[36m2019-02-13 22:09:29.990 UTC [policies] Evaluate -> DEBU 4d724d[[0m == Done Evaluating *policies.implicitMetaPolicy Policy /Channel/Application/Writers
    [[36m2019-02-13 22:09:29.990 UTC [policies] Evaluate -> DEBU 4d724e[[0m Signature set satisfies policy /Channel/Writers
    [[36m2019-02-13 22:09:29.990 UTC [policies] Evaluate -> DEBU 4d724f[[0m == Done Evaluating *policies.implicitMetaPolicy Policy /Channel/Writers
    [[33m2019-02-13 22:09:29.990 UTC [orderer.common.broadcast] ProcessMessage -> WARN 4d7250[[0m [channel: testorgschannel3] Rejecting broadcast of normal message from 10.188.208.4:13293 with SERVICE_UNAVAILABLE: rejected by Order: no Raft leader
    [[36m2019-02-13 22:09:29.990 UTC [orderer.common.server] func1 -> DEBU 4d7251[[0m Closing Broadcast stream
    [[34m2019-02-13 22:09:29.990 UTC [comm.grpc.server] 1 -> INFO 4d7252[[0m streaming call completed grpc.service=orderer.AtomicBroadcast grpc.method=Broadcast grpc.peer_address=10.188.208.4:13293 grpc.code=OK grpc.call_duration=1.111798ms
    [[36m2019-02-13 22:09:29.992 UTC [orderer.consensus.etcdraft] consensusSent -> DEBU 4d7253[[0m Sending msg of 28 bytes to 1 on channel testorgschannel15 took 3.666µs
    [[36m2019-02-13 22:09:29.992 UTC [orderer.consensus.etcdraft] consensusSent -> DEBU 4d7254[[0m Sending msg of 28 bytes to 3 on channel testorgschannel15 took 3.106µs

    PTE log shows:
    [[32minfo[[39m: [PTE 2 main]: stdout: [[32minfo[[39m: [PTE 2 exec][1550095769214 Nid:chan:org:id=0:testorgschannel3:org2:0 invoke_move_dist_evtBlock] failed to sendTransaction status:

    {“status”:”SERVICE_UNAVAILABLE“,”info”:”no Raft leader“}[[32minfo[[39m: [PTE 2 main]: stdout: [[32minfo[[39m: [PTE 2 exec][1550095769227 Nid:chan:org:id=0:testorgschannel3:org2:1 invoke_move_dist_evtBlock] failed to sendTransaction status:

    {“status”:”SERVICE_UNAVAILABLE“,”info”:”no Raft leader“}

    Issue Links

    Activity

    Scott Zwierzynski added a comment –  – edited

    These logs were also seen.

    2019-02-13 22:09:31.551014 W | wal: sync duration of 2.087949482s, expected less than 1s
    2019-02-13 22:09:31.552566 W | wal: sync duration of 7.750080345s, expected less than 1s
    2019-02-13 22:09:31.558517 W | wal: sync duration of 7.743392857s, expected less than 1s

    I understand that any orderer in a channel would start a new election term if it does not receive any heartbeats from the leader of the channel. With transactions in flight, an orderer leader change can cause transactions to be dropped, and also other transactions to be NACKed.

    We have some questions. Please help us document what happened:

    1. Why was a leader election triggered at all? What caused the delays? Jay mentioned that we were waiting for disk IO to complete. Can we dig any deeper?
    2. Why was a leader election triggered in only one of the channels?
    3. How can an operator tune their network configuration, node hardware, or environment variables to prevent this from happening as much as possible?
    Jay Guo added a comment – 

    1. Why was a leader election triggered at all? What caused the delays? Jay mentioned that we were waiting for disk IO to complete. Can we dig any deeper?

    on linux, it’s this call:

    syscall.Fdatasync(int(f.Fd()))
    

    2. Why was a leader election triggered in only one of the channels?

    because channels are independent from each other, delayed file sync on leader in channelA doesn’t affect leaders in other channels.

    3. How can an operator tune their network configuration, node hardware, or environment variables to prevent this from happening as much as possible?

    • perhaps find out wal sync performance by running wal benchmark test
    • set `ElectionTimeout` accordingly
    • trace syscall to spot delay
    Scott Zwierzynski added a comment – 

    Answers make sense. However, upon further analysis, we notice that something affected transactions in two (and maybe all 3) channels at the same time. It seems we should be looking for the root cause of a specific interruption causing disk write delays (and lost transactions) for all the channels. This has happened twice out of many high-volume traffic tests.

    We clarified the timeline and added our detailed observations in the attachment `lost_transactions.txt`

    Scott Zwierzynski added a comment – 

    We are rerunning the testcase again tonight, FAB-13760, with logs enabled as per direct message conversations with developers, and if the problem occurs we should have data available for review in about 16 hours.

    dongmingDongming Hwang added a comment – 

    Re-ran the same test twice with the latest image 5937ae4 from 2/18/19 with logging level
    FABRIC_LOGGING_SPEC=info:orderer.consensus.etcdraft=debug:orderer.common.cluster=debug.
    The issue was not reproduced though it was observed three times with the same test in the past. However, the TPS is degraded by 1/4 (from 496 to 369) compare to the orderer image 55869bc59 from Jay in 2/12/19. We do not know if this is caused by the debug log or something else. We will run with logging level “info” today to see if the logging level impact the TPS.

    Scott Zwierzynski added a comment – 

    Using INFO logs instead of DEBUG logs with yesterday’s images did not get the TPS back up to the rates we were seeing last week with J’s private images. It looks like the additional time being consumed is reflected in increased endorsement response time as well as broadcast transaction response time. Investigation ongoing; considering network latency and the test client software and the test machine as possible causes which would impact both of those.

    Kostas Christidis added a comment – 

    The observation on TPS going down is useful. Let’s keep this in the back of our mind as/when we do throughput tests.

    The suspicion behind this one is a slow disk. If WAL syncing takes more than 5 seconds the FSM is blocked, hence the leader re-election.

    If we want to confirm this suspicion, we’ll need to re-run and take note of all the warning-level logs that pop-up whenever WAL syncing takes more than 1 second.

    Jason Yellick added a comment – 

    Any updates on this?

    Scott Zwierzynski added a comment – 

    Yes, in past runs we have seen wal logs (delays over 5 secs) when leadership changes occurred (warnings which disappeared when the leadership changes disappeared when we increased our hosts IOPS per GB from 2 to 10). Therefore we agree this is the likely cause of the leadership changes (and makes sense with our analysis provided in the attachment lost-transactions.txt) – even though it is difficult to obtain comprehensive and conclusive data to show that it is the only cause, or that those warning logs occurred every time. We will hold this bug open for now, and keep an eye open during other testing.

    Scott Zwierzynski added a comment –  – edited

    Regarding suggestions from Jay Guo :

    •  perhaps find out wal sync performance by running wal benchmark test  <– Dongming Hwang Let’s try this in hosts with 2 IOPS and again in hosts with 10 IOPS
    •  set `ElectionTimeout` accordingly  <– Do you mean increasing ElectionTick (10) and/or TickInterval (500ms) to allow more than 5 secs before a follower times out waiting for heartbeats and calls for a new election? OK, we previously discussed this and think it might help the network accommodate these occasional delays. However we would prefer to look for another solution because it comes with the drawback that it would take longer for the network to recognize a true outage of the channel leader orderer.
    • trace syscall to spot delay <– please provide more specifics of what/when/how to do this
    dongmingDongming Hwang added a comment – 

    We tested with IOPS=10 and do not see the issue.

    dongmingDongming Hwang added a comment – 

    Close the issue.


    露水湾 , 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权
    转载请注明原文链接:RAFT orderers SERVICE_UNAVAILABLE
    喜欢 (0)
    []
    分享 (0)
    关于作者:
    发表我的评论
    取消评论

    表情 贴图 加粗 删除线 居中 斜体 签到

    Hi,您需要填写昵称和邮箱!

    • 昵称 (必填)
    • 邮箱 (必填)
    • 网址