[go: up one dir, main page]

Menu

/ddc/trunk Commit Log


Commit Date  
[r1166] by mukau

v2.1.24-rc1
* use new ddcCorpusList::rfind() for CConcIndexator::LoadMaskedFiles()
- string-based file-masking only masks the LAST occurrence of filename key (rather than the FIRST)
- backwards-compatible for old-style unique-filename corpora
- should also Do The Right Thing for "replaced" files snuck in by dstar-ddc-update.perl
- more precise control of file-masking possible with binary CORPUS._masked_ids

2019-01-16 10:02:10 Tree
[r1165] by mukau

+ fixes for clean compilation under gcc v7.3.0 (e.g. ubuntu 18.04.1 LTS)

2019-01-15 10:30:52 Tree
[r1164] by mukau

* new typedef CFileNo = DWORD
* new typedef CMaskedFileSet = set<CFileNo> for CConcIndexator::m_MaskedFiles
* propagate CConcIndexator::m_MaskedFiles correctly through ddc_union, ddc_split using new SaveMaskedFileIds() method

2019-01-14 15:28:54 Tree
[r1163] by mukau

* propagate CORPUS._masked_ids through ddc_union

2019-01-14 15:10:56 Tree
[r1162] by mukau

* added support for CORPUS._masked_ids : binary variant of CORPUS._masked
- quick and dirty implementation uses existing resident set<size_t> CConcIndexator::m_MaskedFiles
* added support for CConcIndexator::m_MaskedFiles in CConcIndexator::DumpIndex() (ddc_dump)

2019-01-10 15:32:39 Tree
[r1160] by mukau

v2.1.23: bug-fix for suffix search (leftmost suffix character wasn't getting tested correctly;
search (*aufzug) in ND found "Aufzug"; culprit was bogus comparison ("ufzug" < "aufzug")

2018-12-13 12:29:59 Tree
[r1159] by mukau

v2.1.22: doc updates

2018-12-13 07:44:22 Tree
[r1157] by mukau

v2.1.22: bug-fixes for new prefix- and suffix-search introduced in v2.1.21 (mantis #33072 [bams|zeit])
* speed improvements too
- prefix searches are 7-26% faster than ddc-v2.1.18
- suffix search performance gains from v2.1.21 remain; especially for highly specific suffixes:
+ *haus : ca. 32x faster (S=32.07)
+ *erde : ca. 74x faster (S=74.49)

2018-12-12 10:31:32 Tree
[r1155] by mukau

+ ddc-2.1.21 release

2018-12-10 15:14:18 Tree
[r1153] by mukau

v2.1.21-rc1:
* added suffix-query optimization via new index files CORPUS._suffix_TOKATTR
- new low-level evaluation method CStringIndexSet::QueryTokenListWithRightTruncation()
- new suffix-sorted index CIndexSetForQueryingStage::m_rIndex
- can result in drastically improved suffix query-times compared to old regex-based vocabulary scan:
+ *en : 24% faster (Amdahl S=1.24)
+ *chen : 169% faster (Amdahl S=3.69)
+ *ber : 432% faster (Amdahl S=5.32)
* added ddc_(index|union|split) code to create CORPUS._suffix_TOKATTR for all indices at build-time
- very small memory and disk footprint (<1% total index size, growth O(NTypes))
* suffix-indices are still optional: if not present, old regex-based vocabulary scan will be used

2018-12-10 15:00:54 Tree
Older >