# R&D prospect at KEK

#### S. Yamada

2017 TRG/DAQ session at NTU

#### Web site for the upgrade

#### https://confluence.desy.de/display/BI/Upgrade+of+the+Belle+II+Readout+Subsystem

| ()                                                                                                                                                            | cople Create •••                                                                                                                                                                                                                           |                            |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|
| <ul> <li>DAQ Operation in Belle in Gio</li> <li>DAQ PocketDAQ</li> </ul>                                                                                      | Pages / / DAQ WebHome 🔓 🛛 🥒 🖉                                                                                                                                                                                                              | t ☆ Save <u>f</u> or later |
| <ul><li>DAQ PXDIntegration</li><li>DAQ TimingDistribution</li></ul>                                                                                           | Upgrade of the Belle II Readout Subsystem<br>Satoru Yamada posted on 13. Apr. 2017 04:36h - last edited by Satoru Yamada on 24. Aug. 2017 05:01h                                                                                           |                            |
| <ul> <li>Readout PCs</li> <li>Slow control</li> <li>Upgrade of the Belle II Read</li> </ul>                                                                   | <ul> <li>Institutes / people who are interested in the upgrade.</li> <li>I. Koronov (TUM)</li> <li>KEK Belle II DAQ group</li> </ul>                                                                                                       |                            |
| <ul> <li>&gt; Detector BelleIISchedule</li> <li>&gt; Detector BelleIIWeeklyReport</li> <li>&gt; Detector Material</li> <li>&gt; Detector Radiation</li> </ul> | <ul> <li>Tao Luo (Fudan University)</li> <li>A. Bozek and W. Ostrowicz (INP, Krakow)</li> <li>L. Wood (PNNL)</li> <li>M. Andrew, L. Macchiarulo, and G. Varner (U. Hawaii)</li> <li>Belle II LAL group</li> </ul>                          |                            |
| <ul> <li>ECL WebHome</li> <li>IR WebHome</li> <li>KLM WebHome</li> <li>LABM WebHome</li> </ul>                                                                | <ul> <li>Past meetings</li> <li>B2GM in Jun. 2017 <ul> <li>Plan for readout upgrade : KEK (S.Yamada/M.Nakao)</li> </ul> </li> <li>Belle II regular DAQ meeting in Apr. 2017 <ul> <li>Timeline for Upgrade (R. Itoh)</li> </ul> </li> </ul> |                            |
|                                                                                                                                                               |                                                                                                                                                                                                                                            |                            |

# 1. Motivation of the upgrade

### Role of readout system in the Belle II DAQ system

- Read data via Belle2link( from FEE ) and send them over Ethernet ( to Readout PCs )
- Event-building of data from 4 FEEs, which correspond to 4 FINESSE slots on a COPPER
- > Data formatting (Adding header and trailer)
- ▶ Fast control (e.g. send BUSY signal to FTSW when COPPER FIFO is almost full )
- Slow control ( Configure FEE though Belle2link )



## Issues to be considered for the Belle II DAQ system

Difficulty in maintenance during the entire Belle-II experiment period

- The number of discontinued parts is increasing.
  - e.g. chipset on a PrPMC card, FIFO and LAN controller on COPPER III
  - For older COPPER II, it is basically difficult to replace parts according to manufacturer.
  - > Four different types of boards( COPPER, TTRX, PrPMC, HSLB) should be taken care of.

Limitation in the improvement of performance of DAQ

- A. Bottlenecks of the current COPPER readout system
  - ➢ CPU usage
    - About 60% COPPER-CPU is used at "30kHz L1 trigger rate with 1kB event size/COPPER"(=Belle II DAQ target value)
  - Data transfer speed
    - ➤ 1GbE/COPPER
- B. Bottleneck due to network output of ROPC
- We need to upgrade the readout system when

  - \* luminosity of SuperKEKB exceeds expectations.
    \* Lower threshold of L1 trigger is used or trigger-less DAQ is realized.
  - Depending on throughput, network and HLT farms also need to be upgraded. 2017 TRG/DAO session at NTU

# 2. Possible options and firmware development

### Boundary condition



Basic framework of belle2link (Rocket-IO based serial link) should be the same. Otherwise FEE's FW/HW update might be needed.

Upgrade like GbE -> 10GbE will be possible, if we upgrade switches.

## <u>Throughput</u>

#### From DAQ Twiki @ 2014 (SVD : 3samples/hit) : (maybe obsolete)

|       | occup<br>ancy |     | flow/l<br>ink | daq<br>ovh | detec<br>tor<br>buffer<br>total<br>flow<br>[MB/<br>s] | inputs<br>(                 | •           | input                       | of<br>s/boar<br>d<br>10  | inputs<br>(                 | of<br>s/boar<br>d<br>20 | inputs<br>(                 | of<br>s/boar<br>d<br>30 | input                       | of<br>s/boar<br>d<br>40  |
|-------|---------------|-----|---------------|------------|-------------------------------------------------------|-----------------------------|-------------|-----------------------------|--------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|--------------------------|
|       |               |     | [MB/s<br>]    |            |                                                       | data<br>flow<br>/boar<br>ds | kU<br>board | data<br>flow<br>/boar<br>ds | # of<br>RO<br>board<br>s | data<br>flow<br>/boar<br>ds | KU<br>board             | data<br>flow<br>/boar<br>ds | RU<br>board             | data<br>flow<br>/boar<br>ds | # of<br>RO<br>board<br>s |
| SVD   | 1.7           | 48  | 8.9           |            | 428                                                   | 35.7                        | 12          | 85.6                        | 5                        | 142.7                       | 3                       | 214.0                       | 2                       | 214.0                       | 2                        |
| CDC   | 10            | 302 | 0.6           |            | 175                                                   | 2.3                         | 76          | 5.6                         | 31                       | 10.9                        | 16                      | 15.9                        | 11                      | 21.9                        |                          |
| ТОР   | 2.5           | 64  | 1.5           |            | 96                                                    | 6.0                         | 16          | 13.7                        | 7                        | 24.0                        | 4                       | 32.0                        | 3                       | 48.0                        |                          |
| ARICH | 1.5           | 90  | 1.1           |            | 84                                                    | 3.7                         | 23          | 9.3                         | 9                        | 16.8                        | 5                       | 28.0                        | 3                       | 28.0                        |                          |
| ECL   | 33            | 52  | 7.7           |            | 360                                                   | 27.7                        | 13          | 60.0                        | 6                        | 120.0                       | 3                       | 180.0                       | 2                       | 180.0                       | 2                        |
| BKLM  | 1             | 24  | 9.7           |            | 60                                                    | 10.0                        | 6           | 20.0                        | 3                        | 30.0                        | 2                       | 60.0                        | 1                       | 60.0                        | 1                        |
| EKLM  | 2             | 36  | 15.9          |            | 42                                                    | 4.7                         | 9           | 10.5                        | 4                        | 21.0                        | 2                       | 21.0                        | 2                       | 42.0                        | 1                        |
| sum   |               |     |               |            |                                                       |                             | 155         |                             | 65                       |                             | 35                      |                             | 24                      |                             | 19                       |

Data flow per b2link is not so large.

-> if the inputs per board is increased from current 4HSLB/COPPER, we can largely reduce # of RO boards.

-> In that case, some of outputs will become larger than the GbE limit. We need to use 10GbE or reduce # of inputs per RO board for some sub-detectors.

➤ # of inputs ch affect of the selection/of PegeAon at NTU





### Some of Key factors lies in FPGA firmware development

- # of inputs
  - ${\sim}10~{\rm B2links(GPT)}$  on one board
- Even-building and formatting
- Output protocol
  - Ethernet : 1GbE or 10GbE
- 10GbE output by FPGA((u)ATCA) or PC (PCIe option)
  - FPGA : which IP core will be used ? How to deal with the network congestion
- Long term support for maintenance
  - Board/Firmware development team needs to closely watch the system for years after it starts working in the Belle II.



#### <u>To push things forward, experience of firmware work for</u> <u>more concrete discussion is needed</u>

- Available hardware for test/firmware development was discussed at the last B2GM
- Evaluation board (no budget at KEK this fiscal year?)
- DHPCle
  - Prof. Kuhn would ask Igor-san to provide one board to KEK.
- Higuchi-board
  - I borrowed one board used at DEPFET project from Konno-san
  - Konno-san is working for providing resource (DEPFET firmware/driver) to DAQ group.
  - -> Those boards are not available at KEK now.



Current Data processing by COPPER and readout PC to be covered by new RO board



- Not so complicated operation, which could be done by firmware.
- But some data-check and error handling needs to be done by software
  - Keep readout PCs or HLT may be able to do those detailed check

### Start to play with a HSLB board for firmware development



#### Event # = 0x200c

#### Itoh-san's previous talk about discussion items :

Example of task sharing (discussion item)

- 1. Detector interface : Belle2link (and FTSW?)
  - \* Need to implement HSLB firmware in new readout card
  - \* Revisit to sender firmware might also be necessary
  - \* Update in FTSW related firmware together?
  - \* High-density implementation (>20 optical inputs/board)
- 2. Porting of COPPER data processing software in FPGA firm
  - \* Data formatting
  - \* Event building
  - \* Data reduction
- 3. Output implementation
  - \* Possibly PCI-e interface to be connected to readout PC
  - \* Option : direct 10GbE output with some ethernet core.
  - \* Readout PC software is a part of coverage.
- 4. Hardware development for board mass-production
  - \* Evalulation of latest FPGA
  - \* High-density implementation of optical fiber receiver
  - \* PCI-e interface



# <u>Summary</u>

- Even though we have not started the Belle II experiment, it is useful to start thinking possible option of future upgrade of Belle II readout system, because
  - It will become difficult to repair of broken COPPER boards
  - We need to handle the unexpected increase of eventrate or event size.
- Hardware spec. is still open.
  - 'Input = belle2link' and 'output = Ethernet or PC server' will be the boundary condition.
- Firmware in the new RO board should do the dataprocessing currently done by COPPER and readout PC.
- Start playing with available hardware for the firmware development.



Example of a board sketch





2017 TRG/DAQ session at NTU

#### Comparison of setups

|             | RO boards | # of<br>PCs | Output to<br>HLT    | Data-handling |
|-------------|-----------|-------------|---------------------|---------------|
| COPPER-like | 20-50 1)  | 20-50       | 1GbE <sup>2)</sup>  | Software ©    |
| PCIe        | 20-50     | 20-50       | 1GbE                | Software      |
| 2 step      | 20-50     | 0           | 10GbE <sup>3)</sup> | firmware 🛞    |
| 1 step      | 20-50     | 0           | 1GbE                | firmware      |

We still have time to decide what to choose.

- Information of event size in actual data-taking will be obtained in the phase II run.
- Estimating processing and I/O ability(implementing many b2link cores and data processing function) by using a test board will be very useful in R & D phase.
- Hopefully, better/cheaper 'commercial off-the-*shelf' products will come.* 
  - FPGA
  - Servers, NIC, switch, PCIe

# ALICE RUN2 readout board



# Former candidate for LHCb Run3

# MiniDAQ1 hardware

AMC40 mezzanine + AMCTP carrier

#### AMC40

- Stratix5 FPGA
- 3 MiniPOD AFBR-811VxyZ (Tx)
- 3 MiniPOD AFBR-821VxyZ (Rx)
- Up to 24 GBT/WB/GWT
- Up to 12 10GBASE-R Ethernet

#### AMCTP

- Local 40/80 MHz oscillator
- External clock input
- COM Express Module
- PCI Express x1 to FPGA
- GbE to LAN





22/05/2017

TIPP 2017 - PAOLO DURANTE - MINIDAQ1

2017 TRG/DAQ session at NTU

# Current candidate for LHCb Run3

# MiniDAQ2 hardware (PCle40)



- PCI express add-in card
  - Full-length, full-height
- Arria10 FPGA
  - 2x resources as Stratix5
  - 24 links: 85% on S5 to 46% on A10
- High-density optical IO
  - Up to 48 bidirectional links
- PCIe Gen3.0 interface to Event Builder
  - Custom 100 Gb/s DMA engine
- Design has been validated
  - Full board self-test
- Initial production started
- Collaboration institutes have started to receive first devices





TIPP 2017 - PAOLO DURANTE - MINIDAQ1

15

# PXD DHH/ COMPASS

• Probably, details in Igor-san's talk

<u>UT3/4</u>

### Universal Trigger module 3 (UT3)

- FPGA : Virtex-6 HXT
  - FF1923 package : 3 FPGA choices
  - VHX380T ... 14 modules
  - VHX565T ... 14 modules
    - GDL: 4 (2 spares)
    - CDC : 18 (2 spares)
    - KLM : 2 (1 spares)
    - For test bench : 4
- IO
  - Main board
    - Clock: 1 in, 1 out
    - NIM : 2 in, 2 out
    - 24 GTH (11 Gbps x 24)
    - LVDS : 64(32x2) in/out
  - GTX daughter board (optional)
    - 40 GTX (6.25 Gbps x 40)
  - General IO board (optional)
    - Clock : 2 out
    - NIM : 8 in, 8 out
    - RJ-45 for Belle2Link : 4



#### Other motivation for faster readout system ?

From b2note : "L1 Trigger Menu for Low Multiplicity Physics" https://d2comp.kek.jp/search?ln=en&cc=Belle+II+Notes+%3A+Physics&sc=1&p=&f=&action\_search=Search

| Physics related with low multiplicity event                                            |                     | Processes                         | T1:2trk | T2:1trk1mu | T3:1mu | T4:1trk1c | T1:bbc | T2:3g | T3:3t | Combine |
|----------------------------------------------------------------------------------------|---------------------|-----------------------------------|---------|------------|--------|-----------|--------|-------|-------|---------|
| * Bhabhas, $e+e- \rightarrow \gamma \gamma$ , $e+e- \rightarrow \mu + \mu -$           |                     | $B^0 \bar{B^0}$                   | -       | 96.5       | 50.0   | 82.9      | 44.8   | 93.4  | 99.4  | > 99.9  |
| luminosity, calibration, QED physics topics                                            |                     | $B^+B^-$                          | -       | 96.5       | 51.7   | 84.1      | 46.2   | 92.6  | 99.5  | > 99.9  |
| * single photon                                                                        |                     | $\operatorname{ccbar}$            | -       | 96.8       | 65.9   | 89.4      | 52.1   | 84.8  | 98.0  | > 99.9  |
| - dark matter search: $e+e- \rightarrow \gamma A'(->\chi \chi)$ : $A'=dark$            |                     | uds                               | -       | 96.5       | 68.0   | 89.1      | 50.0   | 81.1  | 97.2  | >99.9   |
| photon, $\chi$ =dark matter                                                            |                     | $\tau \rightarrow \text{generic}$ | 51.0    | 60.0       | 57.2   | 62.6      | 28.1   | 55.6  | 29.1  | 94.2    |
| * Initial State Radiation(ISR) : $e+e- \rightarrow \gamma \pi + \pi -$                 | $\epsilon(\%)$      | $\tau \tau (1v1)$                 | 81.0    | 58.1       | 61.8   | 61.3      | 27.9   | 47.4  | -     | 97.3    |
| - important for muon g-2 measurement                                                   |                     | $\tau \rightarrow e\gamma$        | 80.0    | 55.1       | 56.0   | 91.7      | 52.3   | 85.7  | -     | 99.0    |
| * tau 1 vs 1 final states :                                                            |                     | $\tau \rightarrow \mu \gamma$     | 76.1    | 48.1       | 46.2   | 87.7      | 57.9   | 82.2  |       | 97.1    |
| - each $\tau$ has one charged track                                                    |                     | $\pi\pi(\gamma)$                  | 67.9    | 51.9       | 67.4   | 80.0      | 43.4   | 42.5  |       | 97.4    |
| $-\tau \rightarrow \mu \gamma$ etc.                                                    |                     |                                   | 66.7    | 49.4       | 66.3   | 79.1      | 43.0   | 38.6  |       | 97.2    |
| * pi0 transition form factor                                                           |                     | (1)[-)]                           | 11.1    | 83.4       | 35.4   |           | 92.4   | 17.0  | 81.7  | > 99.9  |
| <ul> <li>two photon -&gt; pi0 production</li> <li>* Y di-pion transition</li> </ul>    |                     |                                   | 98.9    | 94.5       | 99.7   | 50.5      | 52.4   | 17.0  | 01.7  | 55.5    |
| - Y (2,3S)-> $\pi + \pi$ - Y (1S) and Y (1S) -> v v bar or                             |                     |                                   |         |            |        | -         | -      | -     | -     | > 99.5  |
|                                                                                        | ( ) )               |                                   | 2.2     | 0.1        | 0.1    |           | 0.8    |       | 0.1   | 3.4     |
| $\begin{array}{c} \chi \chi \\ \ast \gamma \gamma \rightarrow \pi 0 \pi 0 \end{array}$ | $\sigma(\text{nb})$ |                                   | 2.6     |            |        |           |        |       | 0.1   | 3.3     |
|                                                                                        |                     | $ee(\gamma)$                      | 7.2     | 7.3        | 10.5   | 11.1      | 13.1   | 2.9   | 0.6   | 32.2    |

TABLE VIII: Efficiencies and Cross section after triggers

- If there are some trigger modes with low efficiency, lowering threshold with reinforced RO system may contribute the improvement of the efficiency.

- But, it is not straightforward for the Belle II experiment, where trigger efficiency is already high.

#### A. Bottlenecks of the current COPPER readout system: 1HSLBs/COPPER (SVD)



https://confluence.desy.de/display/BI/DAQ+EventSizeOfEachSubDetector

|      | #ch    | 000      | #link | /link     | #CPR | ev sz  | total  | /CPR      |
|------|--------|----------|-------|-----------|------|--------|--------|-----------|
|      |        | [%]      |       | [MB/s]    |      | [kB]   | [MB/s] | [MB/s]    |
| PXD  | 8      | 2        | 40    | 455       | —    | 800    | 1820   |           |
| SVD  | 223744 | 1.7(5.5) | 48    | 8.9(33.8) | 48   | 14.9   | 428    | 8.9(33.8) |
| CDC  | 14336  | 10       | 302   | 0.6       | 76   | 6      | 175    | 2.3       |
| BPID | 8192   | 2.5      | 64    | 1.5       | 16   | 3.2    | 96     | 8         |
| EPID | 65664  | 1.5      | -90-  | 72 1.1    | -23- | 18 2.8 | 84     | 4.2       |
| ECL  | 8736   | 33       | 52    | 7.7       | 26   | 12     | 360    | 15        |
| BKLM | 19008  | 1        | 24    | 9.7       | 6    | 2      | 60     | 10        |
| EKLM | 16800  | 2        | 16    | 35.8      | -9-  | 4 1.4  | 42     | 4.7       |
| TRG  |        |          | 19    |           | 10   |        |        |           |

COPPER CPU usage will be the bottleneck.

#### B. Bottlenecks of the readout PC



Throughput is saturated due to the limit of output GbE bandwidth.

#### If the update of FEE sides is allowed …



Better Debugging/Maintenance ? :

Firmware-debugging seems to take more than x10 long time than software-debugging for non-experts like me …

Difficult

#### But

- A lot of firmware update in FEE sides
- Probably, don't have enough buffer on FEE boards
   Busy signal to FTSW like COPPER ?