ddc_index error message:
$ ddc_index PROJECT.opt
...
MaxTokenCountInOnePeriod = 100000
MaxInputLoadIndexSize = 400000
max load index size is supposed to be less than one period, update ddc parameters!
13:34:15.772078472 (ddc_index) > cannot index the project!
please explain what new UserMaxInputLoadIndexSize opt-file parameter does
maybe document it in doc/ddc_opt.pod (perl POD format)
if at all possible, please turn exception in ConcIndexatorInvoker::IndexFiles() into a warning (ddcLogWarn(Format(...))) and force parameter(s) to compatible values (pick one that makes sense, no preprocessor macros PERIOD_SIZE_WINS etc. should actually be needed):
if (Indexator.GetMaxInputLoadIndexSize() > Indexator.GetMaxTokenCountInOnePeriod()) {
#if PERIOD_SIZE_WINS
//-- either period-size always wins (easy)
ddcLogInfo("WARNING: inconsistent MaxInputLoadIndexSize=%u > MaxTokenCountInOnePeriod=%u ; forcing MaxInputLoadIndexSize=%u",
Indexator.GetMaxInputLoadIndexSize(), Indexator.GetMaxTokenCountInOnePeriod(), Indexator.GetMaxTokenCountInOnePeriod());
Indexator.m_UserMaxInputLoadIndexSize = Indexator.GetMaxTokenCountInOnePeriod();
#elif LOAD_SIZE_WINS
//-- ... or max-input-size always wins (also easy)
ddcLogInfo("WARNING: inconsistent MaxInputLoadIndexSize=%u > MaxTokenCountInOnePeriod=%u ; forcing MaxTokenCountInOnePeriod=%u",
Indexator.GetMaxInputLoadIndexSize(), Indexator.GetMaxTokenCountInOnePeriod(), Indexator.GetMaxInputLoadIndexSize());
Indexator.m_UserMaxTokenCountInOnePeriod = Indexator.GetMaxInputLoadIndexSize();
#else
//-- ... or we try to be "smart" and dispatch depending on what the user actually specified
if (m_bUserMaxTokenCountInOnePeriod && m_UserMaxInputLoadIndexSize==0) {
//-- user specified only period-size: force MaxInputLoadIndexSize
ddcLogInfo(Format("INFO: forcing MaxInputLoadIndexSize=%u\n", Indexator.GetMaxTokenCountInOnePeriod());
m_UserMaxInputLoadIndexSize = Indexator.GetMaxTokenCountInOnePeriod();
}
else if (!m_bUserMaxTokenCountInOnePeriod && m_UserMaxInputLoadIndexSize!=0) {
//-- user specified only MaxInputLoadIndexSize: force period-size
ddcLogInfo(Format("INFO: forcing MaxTokenCountInOnePeriod=%u\n", Indexator.GetMaxInputLoadIndexSize());
m_UserMaxTokenCountInOnePeriod = Indexator.GetMaxInputLoadIndexSize();
}
else {
//-- user specified both period-size and load-size, but inconsistently
throw CExpc(Format("ERROR: can't resolve UserMaxInputLoadIndexSize=%u > UserMaxTokenCountInOnePeriod=%u - adjust opt-file parameters!",
Indexator.GetMaxInputLoadIndexSize(), Indexator.GetMaxTokenCountInOnePeriod()));
}
#endif
//... stuff happens
}
in order for any of the above to work, the size parameters (m_UserMaxInputLoadIndexSize, m_UserMaxTokenCountInOnePeriod) either need to be made public, or ConcIndexatorInvoker needs to get write access… so maybe it would be better to put this kind of logic into opt-file loading CConcordance::LoadOptionsFromString() or some other kind of initialization and sanity-checking routine between option-loading and actual indexing.
however we do it, I would like these inconsistencies to result in warnings rather than errors, and to be automatically massaged into "reasonable" defaults if the user sets only one of them explicitly (and maybe even if the user sets them both inconsistently)
Bryan, there was alwasy a hard-coded constant:
./ConcordLib/ConcIndexator.cpp: const size_t MaxInputIndexSize = 400000;
I enabled ths constant to be modified in the opt file to test indexing in a toy environment as if we have almost no memory.
First tokens go to "InputLoadIndex". If InputLoadIndexSize is big enough (MaxInputIndexSize), tokens are moved to "MemoryLoadIndex". If MemoryLoadIndex is big enough(MaxTokenCountInOnePeriod), it is written to disk (and it is one "Period").
These parameters can be tuned inside the source code or in the opt-file , and they are very intrinsic. Exception "throw CExpc("max load index size is supposed to be less than one period, update ddc parameters!");" is like assert. I do not want to do build compatible values, it is too complicated. If you insist, I prefer delete the test (indexing under different memory setups) and make this constant static again,
ok, thanks for the explanation. under the circumstances, it sounds to me like the
PERIOD_SIZE_WINSstrategy makes the most sense. If you don't want to implement it, please mark this ticket aswont-fixand I'll work it in later.fixed in delwin-merge2 r1376