Bottleneck in 072 and 074 fuzzers #1214

acomodi · 2020-01-28T12:26:00Z

The data dumping process for fuzzers 072 and 074 is taking a huge part in the run-time, expecially for big parts (e.g. artix 200T).

For what regards fuzzer 074, the run-time to get the data is divided in tiles and nodes:

Vivado start time:
10:48:25
Tiles Job start time:
10:48:34
Tiles Job end time and Nodes Job start time:
11:32:58
Nodes Job end time:
11:36:14
Vivado end time and reduction start time:
11:36:14

The above is related to the zynq7010 part.

This is an issue, as it prevents scaling on bigger parts.
There is the need to find a more optimal solution to dump all data necessary for the reduction step.

The text was updated successfully, but these errors were encountered:

litghost · 2020-01-28T15:19:03Z

Looks like the node job time is pretty small compared to the tiles job time, so that is where we should look for issues.

acomodi · 2020-01-28T15:23:14Z

@litghost Yep, I have thought that maybe, we exceed in the production of the tiles.json5 temporary files. Meaning that we could use an ROI to shrink the data produced, being the tiles with the same name similar at 99%, except some missing wires.

We could extract the tiles from 1 or 2 clock regions, maintaining the run-time constant with the change of the part. I need to verify whether this is doable though.

litghost · 2020-01-28T15:26:21Z

We could extract the tiles from 1 or 2 clock regions, maintaining the run-time constant with the change of the part. I need to verify whether this is doable though.

This is a fragile solution, for a number of reasons. There are "weird" tiles around the following areas:

Hard blocks
Top and bottom of the grid
Near the clock regions

As a result, there would be a fairly manual process to identify all the "weird" stuff.

My baseline assumption right now is we are doing something in the tiles loop that is "expensive", e.g. a linear lookup, that needs to be fixed.

I suggest bisecting the work that jobtiles.tcl does until the runtime drops.

As a concrete example, what if jobtiles.tcl only outputs the wires in the tiles, does it still take as long, etc, etc

acomodi · 2020-01-28T15:29:12Z

@litghost Right, I'll start to profile run-time in a more detailed way to get where exactly is the bottleneck of the process.

acomodi · 2020-01-28T15:58:06Z

@litghost I think I have identified what the problem is and where is the bottleneck.

By disabling the pip loop that extracts all pips related to a tile, run-time dropped from ~44 minutes to ~8 minutes for the zynq7010.

Moreover, I am keen to think that the issue is in the INT tiles. They are the most popular tiles, and each of them contains hundreds of pips, resulting in INT.json5 files reaching more than 100k lines.

litghost · 2020-01-28T16:32:50Z

By disabling the pip loop that extracts all pips related to a tile, run-time dropped from ~44 minutes to ~8 minutes for the zynq7010.

Try dropping anything that uses lookup_speed_model_index, or it's children. If that speeds things up, that code is important, but also recent. It might need a refactor.

acomodi · 2020-01-28T18:16:39Z

@litghost That was the right call, run-time is now ~13 minutes for the tiles job

 Tiles Job start time:
 2020-01-28 17:47:59.772677
 Tiles Job end time
 2020-01-28 18:00:31.769572

litghost · 2020-01-28T18:18:53Z

@litghost That was the right call, run-time is now ~13 minutes for the tiles job

Ok, so rather than writing out the full timing info, just write the speed index. Then merge all the tile jsons (e.g. merging the speed index), then create a tcl script to back annotate the speed indices with the timing data originally dumps from the tcl script.

marzoul · 2022-04-24T15:01:40Z

Hi, the last comments may be a bit old, but the issue is still real ;-)
Especially for 074, I have not looked at results of 072. This is when experimenting with virtex-7, chip 330T, the smallest one of that series.

Disk usage of 074 is 82 GB, and this is nearly exclusively the 174k very tiny json5 files. I did an experiment : concatenate all these, and compress with lz4 with fastest compression => result is one 4 GB file (to be compared to 40+ GB of file contents and 82 GB of actual disk usage). So a reduction of 40x.
Given the very low CPU usage during most of 074 (1-3%, peaks at 8% of one CPU), I think that one of the issues at least is access to disk. Yes I have spinning HDD so this is exacerbated, but at least the issue is revealed ;-)

Looking casually into the python code, it looks like these json files are accessed by bulk with processing interleaved, so it could make sense also for CPU, to have this packed+compressed storage. Everything would fit cached in RAM, too :-)
Perhaps use compressed files per-type of FPGA element (slice, SDP, PIP, etc) in case it better fits how the code accesses it, no problem.

Other issue for scalability, I monitored RAM usage => result is up to 66.5 GB of virtual memory.
To my eyes, given the raw amount of FPGA elements and configuration bits, this is excessive. Casually looking into the python code again, I think that the issue is in implementation of database representation in the python code.
Don't hesitate to tell if I'm wrong - but it looks like generic maps indexed by strings are super nasty in python for RAM usage (and speed too of course, indirectly). A conversion of these computations to C++ could be appropriate. Of course, to consider only after evaluation of packed+compressed disk storage).

EDIT : The fuzzer 074 took 20 days to finish xD
There was a bit of swapping involved, hence my focus on 074.

What do you think of these observations ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bottleneck in 072 and 074 fuzzers #1214

Bottleneck in 072 and 074 fuzzers #1214

acomodi commented Jan 28, 2020

litghost commented Jan 28, 2020

acomodi commented Jan 28, 2020

litghost commented Jan 28, 2020

acomodi commented Jan 28, 2020

acomodi commented Jan 28, 2020 •

edited

Loading

litghost commented Jan 28, 2020

acomodi commented Jan 28, 2020

litghost commented Jan 28, 2020

marzoul commented Apr 24, 2022 •

edited

Loading

Bottleneck in 072 and 074 fuzzers #1214

Bottleneck in 072 and 074 fuzzers #1214

Comments

acomodi commented Jan 28, 2020

litghost commented Jan 28, 2020

acomodi commented Jan 28, 2020

litghost commented Jan 28, 2020

acomodi commented Jan 28, 2020

acomodi commented Jan 28, 2020 • edited Loading

litghost commented Jan 28, 2020

acomodi commented Jan 28, 2020

litghost commented Jan 28, 2020

marzoul commented Apr 24, 2022 • edited Loading

acomodi commented Jan 28, 2020 •

edited

Loading

marzoul commented Apr 24, 2022 •

edited

Loading