Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse_pdf_modules.py seems to be not working #1243

Open
rodrigomelo9 opened this issue Dec 18, 2019 · 16 comments
Open

parse_pdf_modules.py seems to be not working #1243

rodrigomelo9 opened this issue Dec 18, 2019 · 16 comments

Comments

@rodrigomelo9
Copy link

Hi @mjasperse I see that you work on xc7/libraries/parse_pdf_modules.py.

I prepared the conda environment as said in the main README, I downloaded the last ug953 (2019.2) and run:

symbiflow-arch-defs/xc7/libraries$ ../../build/env/conda/bin/python3 parse_pdf_modules.py

The output was:

<xml source="ug953-vivado-7series-libraries.pdf" processed="2019-12-18T19:44:40.751514"/>

I tried some different versions. With 2017.4 and 2018.3 the same happens. With 2015.4 I get:

Processing BSCANE2...
Processing BUFG...
Processing BUFGCE...
Processing BUFGCE_1...
Processing BUFGCTRL...
Processing BUFGMUX...
Processing BUFGMUX_CTRL...
Processing BUFH...
Processing BUFHCE...
Processing BUFIO...
Processing BUFMR...
Processing BUFMRCE...
Processing BUFR...
Processing CAPTUREE2...
Traceback (most recent call last):
  File "parse_pdf_modules.py", line 454, in <module>
    xml = process_specs(args.input, args.modules)
  File "parse_pdf_modules.py", line 420, in process_specs
    for A in process_attributes(device):
  File "parse_pdf_modules.py", line 294, in process_attributes
    tbl.process_table()
  File "parse_pdf_modules.py", line 114, in process_table
    leftcol = self.heads[1][0]
IndexError: list index out of range

I am doing something wrong? Do you remember what version has you used? (could be a part of the XML information :-P ).

Thanks.

@rodrigomelo9
Copy link
Author

I was thinking, it is probably enough to know which version of the UG953 was used (and could be written in a README).

In a mature product (in this case the user guide), is not awaited changes on the primitives. I mean, maybe the first versions are incompleted, but when completed no more primitives are added.

@rodrigomelo9
Copy link
Author

I had been studying parse_pdf_modules.py and I found something more related to that.

There are 79 modules in cells_xtra.xml. The script found 80 with the user guide version 2015_2 (BUFGCE and BUFGCE_1 are found but only BUFGCE_1 is in the XML file). There are 80 Port Desc of primitives in the user guide (Ok) but there are more than 100 primitives (I manually counted 115). Lot of them didn't contain the searched string (the Port Descriptions sub-section is not there).

I'm missing something? @mithro ? Any idea about that?

Under each Primitive sub-section, the string "Primitive: " is found, maybe we need to use it, thinking how to solve the missing ports declarations.

@mjasperse
Copy link
Contributor

Hi. So it looks like the version of UG953 I used at the time was 2014.2. I have no idea why that version in particular TBH! It seems very outdated. The UG version should definitely be included in the generated output!

The reason the script doesn't work on newer versions of UG953 is because the PDF outline ("TOC") has changed structure. In 2014.2 each entity has indexed subheadings in the TOC - namely Port Descriptions and Available Attributes - that the script uses to figure out which subset of the PDF pages to parse. The newer versions of UG953 do not have this same PDF structure so the script doesn't find anything that matches what it expects, and returns an empty XML.

Obviously it is undesirable to be relying an old version of the UG. The script should be updated in order to parse the PDF without needing the TOC, so that different versions can be used. I don't think it is a lot of work, but it is definitely non-trivial. I should have some time these holidays though.

Sorry for the confusion!

@rodrigomelo9
Copy link
Author

Ok, don't worry.

Only FYI, I tried with version 2014.2 and 2014.1, both the same:

.../symbiflow-arch-defs/xc7/libraries$ ../../build/env/conda/bin/python3.7 parse_pdf_modules.py -i ignore/2014_1_ug953-vivado-7series-libraries.pdf 
Processing BSCANE2...
Processing BUFG...
Processing BUFGCE_1...
Processing BUFGCTRL...
Processing BUFGMUX...
Processing BUFGMUX_CTRL...
Processing BUFH...
Processing BUFHCE...
Processing BUFIO...
Processing BUFMR...
Processing BUFMRCE...
Processing BUFR...
Processing CAPTUREE2...
Traceback (most recent call last):
  File "parse_pdf_modules.py", line 454, in <module>
    xml = process_specs(args.input, args.modules)
  File "parse_pdf_modules.py", line 420, in process_specs
    for A in process_attributes(device):
  File "parse_pdf_modules.py", line 294, in process_attributes
    tbl.process_table()
  File "parse_pdf_modules.py", line 114, in process_table
    leftcol = self.heads[1][0]
IndexError: list index out of range

Whatever, what worries me most is what I said about the number of primitives:

There are 79 modules in cells_xtra.xml. The script found 80 with the user guide version 2015_2 (BUFGCE and BUFGCE_1 are found but only BUFGCE_1 is in the XML file). There are 80 Port Desc of primitives in the user guide (Ok) but there are more than 100 primitives (I manually counted 115). Lot of them didn't contain the searched string (the Port Descriptions sub-section is not there).

I check 2014.2 and there are also 115 primitives with 79 Port Descriptions (transceivers do not have a port descripction, most LUTs also not, etc).

I will think alternatives :P let me know if I can help you.

mjasperse added a commit to mjasperse/symbiflow-arch-defs that referenced this issue Dec 20, 2019
…xports entities with no port descriptions (issue f4pga#1243)

Signed-off-by: Martijn Jasperse <146605+mjasperse@users.noreply.github.com>
@mjasperse
Copy link
Contributor

It's very strange you get the IndexError exception on 2014_2 when I do not. Regardless, I think I know what causes this and have updated the script to work around it. As a bonus, it can now parse 2015_4 without issue and I have updated the XML file.

The initial scope of the script was to just extract the port descriptions, which is why any primitive without ports was excluded. But I don't see why it shouldn't just read out all the elements, so I removed that requirement and the script now extracts all 115 primitives instead of 79.

@rodrigomelo9
Copy link
Author

Great :-D I will check your commit.

The initial scope of the script was to just extract the port descriptions, which is why any primitive without ports was excluded.

The thing is, that the primitives without Port Desc section, have ports, only the section is missing. Again, I will check what is now generated ;-)

@mjasperse
Copy link
Contributor

Looks like with the changes the script can parse up to 2018_2 without issue. They changed the PDF structure in 2018_3, so the script cannot process newer than that without a significant rewrite.

Fair point about the port description section being missing sometimes despite having ports. In that sense the script's approach is limited because it relies on that documentation existing. I suppose you could parse the instantiation template instead, but that would also be a potentially significant rewrite.

@rodrigomelo9
Copy link
Author

Now, as you said, the 115 modules are there :-D but see what happens with, for example, LUT1 to LUT4:

  <module name="LUT1">
    <attribute name="INIT" type="HEX" default="2'h0" values="Any 2-Bit Value"/>
  </module>
  <module name="LUT2">
    <attribute name="INIT" type="HEX" default="4'h0" values="Any 4-Bit Value"/>
  </module>
  <module name="LUT3">
    <attribute name="INIT" type="HEX" default="8'h00" values="Any 8-Bit Value"/>
  </module>
  <module name="LUT4">
    <attribute name="INIT" type="HEX" default="16'h0000" values="Any 16-Bit Value"/>
  </module>

But check for example LUT1, and you will see that It has ports:
module
instantiation

This is the problem currently. But there are two good news :-D now it works to me with version 2015.4, and I tried with the UG615 (Spartan-6 Libraries Guide for HDL Designs) and it almost works :P failed with:

Processing BSCAN_SPARTAN6...
Processing BUFG...
Processing BUFGCE...
Processing BUFGCE_1...
Processing BUFGMUX...
Traceback (most recent call last):
  File "../../../other-symbiflow/xc7/libraries/parse_pdf_modules.py", line 458, in <module>
    xml = process_specs(args.input, args.modules)
  File "../../../other-symbiflow/xc7/libraries/parse_pdf_modules.py", line 424, in process_specs
    for A in process_attributes(device):
  File "../../../other-symbiflow/xc7/libraries/parse_pdf_modules.py", line 313, in process_attributes
    if x['Type'] == 'STRING' and x[
KeyError: 'Type'

That I assume that is easy to fix, by you... Haha.

In fact, I think that support version 2015.4 is enough and additionally it can also work for Spartan 6 and solve #1246.

@mjasperse
Copy link
Contributor

The script doesn't read the instantiation templates, so I guess those particular entities that lack the descriptions table have to be handled manually. The intention was to parse the documentation instead of the code template to avoid any potential licensing issues. Extracting the text is not entirely trivial - the PDF doesn't contain text in paragraphs or even sentences - sometimes a text object is just an individual letter. So text must be transformed and combined by algorithm, which can be error-prone.

Annoyingly there are lots of inconsistencies with how the tables are laid out so there need to be rules that correct for certain special cases. In particular, the attributes table for PLL_BASE in UG615 contains stupid separator rows that throw off the text-merging and break the extraction. Otherwise, with some minor fixes I was able to parse UG615 with the latest commit.

I did notice some issues though that will need to be addressed with special cases (e.g. ICAP_SPARTAN6). Maybe you can check which ones are problematic and we can resolve in #1246.

@mithro
Copy link
Contributor

mithro commented Dec 20, 2019

I had a couple of thoughts;

  • I'm wondering if we should extract this tool into this own repository? We only really need the XML output in other tools?
  • For a similar style project (extracting data from a PDF) we ended up maintaining a set of "patches" which where applied to the output data. It might make sense to do the same here?
  • Maybe we should run it against all versions of the PDF we can find to see how the XML output evolves over time?

@rodrigomelo9
Copy link
Author

  • Yep, could be a good option another repo for this kind of tools (symbi-extractors ? :P) other options is to be in the utils directory? maybe a sub directory?
  • It seems that patches must be applied because the port order is not guaranteed in the instantiation example)
  • I will check the evolution of the XML, but I suppose no changes after the first versions (when is still incompleted).

@rodrigomelo9
Copy link
Author

I checked with several version (not all). The first available is the 2012.2. The last one where the script works is the 2018.2. So, I downloaded all the 201x.2.

From 2012.2 to 2014.1 (previously downloaded) the script didn't work in my machine. The first where the script works is the 2014.2.

I found something between 2014.2 and 2015.2... Ports order of two components (SRL16E and SRLC32E) changed.

2014 2
2015 2

I am thinking that... The port order in the Port Description tables is not necessary the real order... Must be checked.

Between 2015.2 and 2016.2 an attribute disappeared:

   <module name="ROM256X1">
     <attribute name="INIT" type="HEX" default="256'h0000000000000000000000000000000000000000000000000000000000000000" values="Any 256-Bit Value"/>
-    <attribute name="ROM256X1:" type="256 x 7 Series" default="(LUT)" values="1 Asynchronous Distributed"/>
   </module>

Between 2016.2 and 2017.2 an attribute changed:

     <port name="S" type="input" width="1"/>
     <attribute name="DDR_CLK_EDGE" type="STRING" default="OPPOSITE_EDGE" values="OPPOSITE_EDGE, SAME_EDGE"/>
-    <attribute name="INIT" type="INTEGER" default="1" values="0, 1"/>
+    <attribute name="INIT" type="INTEGER" default="0" values="0, 1"/>
     <attribute name="SRTYPE" type="STRING" default="SYNC" values="SYNC, ASYNC"/>
   </module>
   <module name="ODELAYE2">

Between 2017.2 and 2018.2, two None changed to NONE.

@rodrigomelo9
Copy link
Author

rodrigomelo9 commented Dec 21, 2019

The two Port Description tables in the previous post, are for SRL16E... Here the port order of this primitive:

module SRL16E (Q, A0, A1, A2, A3, CE, CLK, D);

@rodrigomelo9
Copy link
Author

Hi @mjasperse, your last commit works with 2018.2 which I think that is enough. Moreover, it works with the last doc about Spartan 6 and probably with others, such as Spartan 3 (all), Virtex 4, 5 and 6, because the documents are similar structured. There are still problems, but related with the available resources (the PDF docs).

Hi @mithro, I have been studying the primitives in ISE and Vivado. There are several problems with the PDF docs:

  • Incomplete (missing table of ports)
  • The order of the ports is not specified.
  • Some of them seems not to be a primitive of the devices, instead are in the retarget library (compatibility with legacy devices).
  • Some primitives in the devices.lib are missing in the doc (I need the check if really supported, but there are primitives specified there not documented in the user guides).

I am interested in this topic and I really want to contribute, but I think that another approach must be taken. Let me know your ideas about that.

Regards

@mithro
Copy link
Contributor

mithro commented Dec 27, 2019

@rodrigomelo9 Any chance you are at 36c3, if so you should come say Hi!

The idea with the PDF parsing is a first step. It is suppose to generating a baseline of modules which need to be supported to be compatible with the Xilinx tools. It's suppose to be a cross-checking / validation tool to make sure that we haven't missed modules and similar.

Once we have the baseline, we will then need to manually create the mapping between the Xilinx names and the names / functionality we use in Project X-Ray (which are similar but not 100% the same).

@rodrigomelo9
Copy link
Author

rodrigomelo9 commented Dec 27, 2019

@rodrigomelo9 Any chance you are at 36c3, if so you should come say Hi!

No :-( I am actually so far from there, but I will try to be at OSDA in March.

The idea with the PDF parsing is a first step. It is suppose to generating a baseline...

Ok, the actual state of the @mjasperse branch seems great to me in this direction, and can be used also to get modules from Spartan 6 and other families. The script could be moved to utils maybe?

As I said, I had been studying the ISE and Vivado primitives these days (where located, which are primitives and which are retargeted wrappers, etc). I think that it could be useful to have a place with all the cells, a library for each device pointing to the respective primitives, and a method (scripts) to test the support status.

Anyway, these are things to be followed in another place or issue probably (please, let me know the best option). I suppose that there are persons working on that (to support in Yosys and nextpnr). Are they in sync somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants