1.5.0 | 2023-03-13
* The default (recommended) range value has been changed. All invocations
without the range parameter set are affected, as are all runs of the
GetRange function without setting the k parameter. The new default
(recommendation) is less conservative and fixes the issue which used
to arise with imbalanced datasets of few objects where the old
recommended range would not allow to capture the effective split.
The new default mandates that only k=3 objects are needed (as opposed
to k=5) and the estimation is done using the total number of objects
(as opposed to only objects of the less numerous class).
1.4.0 | 2023-01-30
* Contrast variables are now handled separately to achieve the correct
null hypothesis distribution. Each analysed tuple has at most one
contrast variable and tuples with such contribute only to contrast
scores. Actual variables have their max IG computed only among them.
This has also led to the increase of the default number of contrast
variables to equal the number of actual variables (but still at least
30) instead of 10% of them.
* Removed option to output minimum IGs - they were not defined well and
needlessly complicated the implementation. Users relying on this
feature are advised to switch to functions outputting tuples instead
as they give usefully broad view of the IG distribution.
* CPU parallelism efficiency is greatly improved.
* Discretisations are now also parallelised on the CPU.
* AddContrastVariables utility function is deprecated in favor of
GenContrastVariables which returns only the contrast data.
* The value for the divisions parameter is no longer guessed for 1D.
Instead, it always defaults to 1 for consistency.
* Add capability to output tuples' entropy and variation of information
(in case of no decision) and decision's entropy conditioned on tuples
as well as decision's measure analogous to the variation of information
but conditional on the tuples (defined as a sum of information gains).
Variation of information measures are defined only in 2D.
* Optimised 2D version for information gain computation. Now, both decisionful
and decisionless paths are optimised for 2D (1D was already optimal). This
results in a speed gain of 30% of computation time in a scenario with 100
objects and 5000 variables.
* Expose the GetRange function used to calculate the probable optimal value
of the range parameter.
* Add capability to output values averaged instead of maximised over multiple
discretisations.
* Disable the warning from K-S test during p-value fitting. It was too strict
and was warning on perfectly fine fits.
1.3.0 | 2022-04-19
* Adds discrete variant which works (as the name suggests) on
discrete data (as opposed to continuous data) and skips the
randomised discretisation procedure entirely. A dedicated
ComputeMaxInfoGainsDiscrete and ComputeInterestingTuplesDiscrete
functions are provided for this purpose. The current limitation
is that all variables must have the same cardinality.
* Adds decisionless mode -- to allow to compute entropy and mutual
information using the same engine. To compute these, leave the
decision parameter NULL.
* Restores ability to not set I.lower in ComputeInterestingTuples and
thus also to search for higher-dimension interesting tuples.
* Removes legacy undocumented fallback to "guessed" pc.xi (when it was
explicitly set to NULL) as it conflicted with our recommendation.
* Improves validation of input.
1.2.0 | 2021-02-09
* IG threshold equal 0 (the new default value) or below means that there
is no threshold filtering applied.
Some IG results might be negative due to logarithm rounding and they
are passed unchanged and unfiltered to the user now.
Any positive IG threshold filters normally like it did before.
* Unfiltered tuple IG computation uses an optimised procedure now
which is considerably faster.
Speedup ratio is a logarithmic function of the number of variables.
This brings tuple IG computation's performance close to that of the
max IG.
* The optimised tuple IG computation procedure allows now to return the
IG matrix directly.
1.1.1 | 2021-01-21
* Fix compilation on GCC 11.
1.1.0 | 2021-01-06
* Interesting tuples functionality was largely redone. It is now limited
to 2D and requires inputting 1D IG. However, the results are now
interpretable.
* Ability to output minimum IG per tuple instead of only maximum.
NB outputted IG is still maximized over discretizations.
* Minor optimizations.
1.0.5 | 2019-11-10
* fix CUDA failure (issue in linking)
1.0.4 | 2019-10-27
* add function to discretize interesting variables for inspection
* "return.tuples" parameter now also controls returning of relevant
discretization numbers
* change optional parameter "pseudo.count" name to "pc.xi" to better reflect
its effect
* fix "interesting.vars" parameter to not require being sorted
* add "require.all.vars" parameter
* optimize IG computation on CPU (up to 4x speedup)
1.0.3 | 2018-11-07
* add note about recommended FDR control method
* fix a possible memory error introduced in 1.0.2 (mismatched new/delete[] operators)
1.0.2 | 2018-10-31
* improve default range estimation
* allow overriding pseudo.count in MDFS function
* set default p.adjust.method to default p.adjust method ("holm")
* set default level to suggested FWER level (0.05)
* fix CUDA version to not abruptly exit R on error
* fix use.CUDA=T in MDFS function
1.0.1 | 2018-06-26
* fix factors as input decision in ComputeMaxInfoGains,
ComputeInterestingTuples and MDFS
* fix adjusted.p.value in MDFS
* return statistic and p.value in MDFS