[go: up one dir, main page]

Menu

#10 ddc_index(tabs): relax 'first line in tab-formatted file must be a document delimiter ("%$DDC:meta.file_=")' requirement

1.0
closed
None
2019-11-08
2019-11-07
No

error message from ddc_index (tab-mode, multiple-files):
Indexing file `dump/import/../1.tabs'
the first line in tab-formatted file must be a document delimiter ("%$DDC:meta.file_=")
14:15:58.100450353 (ddc_index) > cannot index the project!
all tab-import lines beginning with "%%" which are not "special" commands ("%%$DDC:") should be ignored
even tab-import lines beginning with "%%$DDC:" which aren't "special" commands right now should be ignored (we might later want/need to introduce "%%$DDC:feature_foo=bar" and to compile that index with older binaries, e.g. during bug-hunting)
maybe just scan lines until you find a doc-delimiter and ignore everything until you find one?
bonus points: count the number of dropped lines and report it when you do find a doc-delimiter
in either case, blank lines, lines beginning with "%%", and lines consisting only of whitespace can and should be allowed (and ignored, and not counted) even before the first doc-delimiter in a file.

Discussion

  • Alexey Sokirko

    Alexey Sokirko - 2019-11-07

    Bryan, in https://sourceforge.net/p/ddc-concordance/tickets/4/ I already done skipping empty lines and unknown %%$DDC:feature_foo=bar. But may be I do not understand you.
    Right now %%$DDC:meta.file=somefilename.txt is a file border, so if you write

    %%$DDC:meta.scan_=scan_bibl_stub
    %%$DDC:meta.file_=some_file_name.txt

    then scan_bibl_stub goes to the previous file. Is it a solution you want to change?

     

    Last edit: Alexey Sokirko 2019-11-07
  • Bryan Jurish

    Bryan Jurish - 2019-11-08

    in single-file tab-input mode ("streaming mode"), then yes, scan_=scan_bibl_stub should go to the previous file in this case. in multi-file tab-input mode, it can either be ignored or assigned to the current file. The lines I'm actually seeing in the output of ddc_dump -f -t before %%$DDC:meta:file_= are:

    %%$DDC:tokid.begin=100567
    %%$DDC:tokid.end=267265
    %%$DDC:meta.n_=1
    %%$DDC:meta.file_=test/lambert_organon01_1764.ddc.xml
    
     
  • Alexey Sokirko

    Alexey Sokirko - 2019-11-08

    Ok, the lines before the first "%%$DDC:meta.n=" are always assigned to the first file. That must be true in the current source code.

     
  • Alexey Sokirko

    Alexey Sokirko - 2019-11-08
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,3 @@
    -
     error message from ddc_index (tab-mode, multiple-files):
     Indexing file `dump/import/../1.tabs'
     the first line in tab-formatted file must be a document delimiter ("%$DDC:meta.file_=")
    
    • status: open --> closed
     

Log in to post a comment.