## Status of GRL firmware

Yun-Tsung Lai

**KEK** 

yuntsung@post.kek.jp

Belle II TRG/DAQ workshop 2017 August 23, 2017





#### Outline

• Status and plan of GRL operation in E-Hut

- Development of GRL core logic: matching algorithm in VHDL
  - ISIM results
  - Generator interface
  - Resource usage

• Summary & To do

### Status and plan of GRL operation in E-Hut

- In the current E-Hut setup, the CDCTRG data flow reaches GRL. GRL is receiving tracking results from the 4 2D boards.
  - In addition, the links from ETM and to GDL were built.
  - In GCRT, we use GRL with chipscope to monitor 2D output (tracking result and corresponding TSF info)
- GRL can provide the summary of 2D tracking results. Track counting within a 500 ns window is also implemented.
- Plan in near future: After 3D boards are installed and core logic is online, GRL will be responsible to provide various trigger signals about CDCTRG summary information. e.g. # of tracks



### Matching algorithm – 2D

21st B2GM

- 2D tracker output from TRGCDC:
  - center of track helix:  $\vec{r} = (r, \phi)$
  - $-p_t$
  - Track is extrapolated as a circle/helix inside ECL.
- Output from TRGECL:
  - cluster position on ECL:  $\vec{R} = (x, y) = (R, \theta)$  (Cluster position is with **15cm** depth to the inner surface.)
  - Deposit Energy



- Solve the expect hit position for a track on ECL:
  - $\overrightarrow{R'} = (x', y') = (R, \theta'),$ where  $\theta' = \cos^{-1}(R/2r) + \phi$  or  $\phi - \cos^{-1}(R/2r)$
- $\Delta r \equiv |\vec{R} \vec{R'}|$ : the deviation between cluster position and expected hit position.

## Matching algorithm -Z

### 21st B2GM

- Z output from TRGCDC:
  - $-z_0$ ,  $tan\theta$ ,  $p_z$
  - $\tan\theta = \frac{p_z}{p_t} = \frac{z}{\ell}$
- Z output from TRGECL:
  - cluster z position



- Solve the expect hit position for a track on ECL:
  - $-z'=z_0+\ell tan\theta$ , where  $\ell=2rsin^{-1}(R/2r)$
  - $p_z = p_t tan\theta$
  - $p = \sqrt{p_t^2 + p_z^2}$
- $\Delta z \equiv z z'$ : the deviation between cluster z position and expected z position

### Matching algorithm with FPGA

- GRL input:
  - 2D: 6\*4 tracks' info  $(p_t, \phi)$
  - 3D: 6\*4 tracks' info  $(z_0, \tan \theta)$
  - ETM: ECL clusters' info (Hit position in Cartesian coordinate, energy)
- Matching between a track and a cluster: a single VHDL module.
  - Totally  $N_{track}^{total} * N_{cluster}^{total}$  matching modules in parallel in firmware.
  - Resource usage issue: either trying to reduce the resource usage of a single module, or applying energy threshold to the cluster input to reduce the number.
- The VHDL module is developed by the JSignal class provided by Jae-Bak. Special arithmetic operations are done by using LUT. e.g. sin, cos, sqrt, 1/x (for division), etc
- The logic uses 127 MHz sysclk to reduce latency.

### ISIM result: 2D matching

- CDC input:  $\frac{1}{r}(p_t)$ ,  $\phi$
- ECL input: x, y position
- Latency: 22 sysclk ~ 173 ns

- A multiplication takes 1 clk.
- A LUT procession takes 4 clks.



#### ISIM result: Z matching

- CDC input:  $\frac{1}{r}(p_t)$ ,  $z_0$ ,  $tan\theta$
- ECL input: x, y, z position
- Latency: 13 sysclk ~ 102 ns



## ISIM result: E/p

- CDC input:  $\frac{1}{r}(p_t)$ ,  $tan\theta$
- ECL input: energy
- Latency: 11 sysclk ~ 87 ns



#### Interface of the generator in TSIM

- The VHDL and LUT generator is implemented in TSIM TRGGRLMatch.
- The interface is parametrized to fit in modifications in the future:
  - Max, min, and bit width of the input variables, and LUT IO (depth and width).

```
void
TRGGRLMatch::VHDL constant(std::map<std::string, int> & m mInt, std::map<std::string, double> & m mDouble)
 // Input signals: determined by sub-trigger systems
 m mDouble["ECL X max"] = 130.0; m mDouble["ECL X min"] = -130.0;
 m mDouble["ECL Y max"] = 130.0; m mDouble["ECL Y min"] = -130.0;
 m mDouble["ECL Z max"] = 116.5; m mDouble["ECL Z min"] = -75.5;
 m mDouble["ECL E max"] = 10.0; m mDouble["ECL E min"] = 0.0;
 m mDouble["CDC r max"] = 3000.0; m mDouble["CDC r min"] = 0.0;
 m mDouble["CDC r inv max"] = 0.035; m mDouble["CDC r inv min"] = 0.0;
 m mDouble["CDC phi max"] = M PI; m mDouble["CDC phi min"] = -M PI;
 m mDouble["CDC slope max"] = 1.2; m mDouble["CDC slope min"] = -1.2;
 m mDouble["CDC z0 max"] = 30.0; m mDouble["CDC z0 min"] = -30.0;
 m mInt["ECL X bitsize"] = 10;
 m mInt["ECL Y bitsize"] = 10;
 m mInt["ECL Z bitsize"] = 10;
 m mInt["ECL E bitsize"] = 10;
 m mInt["CDC r bitsize"] = 7;
 m mInt["CDC r inv bitsize"] = 7;
 m mInt["CDC phi bitsize"] = 7;
 m mInt["CDC slope bitsize"] = 10;
 m mInt["CDC z0 bitsize"] = 10;
```

#### Resource usage issue

• The resource usage of a single module:  $\sim 0.6\%$  of LUT. At most 136 parallel modules  $\rightarrow N_{track}^{total} * N_{cluster}^{total}$  is limited.

- Both the LUT IO bit width (depth and width) were set to be 10 bits for simplicity. More detailed estimation on the signals' resolution is necessary to reduce the LUT size effectively.
  - LUT info parameters are also adjustable:

- The total number of ECL clusters would be large (5120 bits for ETM→GRL).

  To reduce the number:
  - Energy threshold.
  - Pick up a certain number of clusters with the highest energy.

```
// Bitsize of LUT input : Used in the constraint be
m_mInt["ECL_R_sq_c_bitsize"] = 10;
m_mInt["ECL_D_sq_c_bitsize"] = 10;
m_mInt["theta_in_c_bitsize"] = 10;
m_mInt["theta0_c_bitsize"] = 10;
m_mInt["theta1_c_bitsize"] = 10;
// same as theta0_
m_mInt["dr_sq_c_bitsize"] = 10;
m_mInt["p_sq_c_bitsize"] = 10;
// Bitsize of LUT output : Determined by yourself,
m_mInt["ECL_R_bitsize"] = 10;
m_mInt["ECL_D_bitsize"] = 10;
m_mInt["theta_bitsize"] = 10;
```

# Summary & To do – Operation in E-Hut

#### • Summary:

CDCTRG data flow reaches GRL for the current GCRT.
 We use GRL to monitor the tracking result from the 4 2D boards and to summarize the track information.

#### • To do:

- Provide 2D track trigger in near future.
- After 3D boards are ready in E-Hut with core logic. GRL will summarize the CDCTRG information and prepare various types of triggers related to tracking.

# Summary & To do – Core logic

#### • Summary:

- The VHDL matching module is implemented by using Jae-Bak's class, which can output dr, dz and E/p.
- Overall latency is 22 sysclk

#### • To do:

- Try to reduce the resource usage by different approaches.
- Make the delay and persistor modules between CDC and ECL inputs.
- Test the logic after ETM provides clustering output.