Helpers for specifying nodes in simulations
Mix two variables together. The output will have the specified R-squared with var1 and variance one.
Evaluate an expression separately for each case
Usage
categorical(n = 5, ..., exact = TRUE)
cat2value(variable, ...)
bernoulli(n = 0, logodds = NULL, prob = 0.5, labels = NULL)
mix_with(signal, noise = NULL, R2 = 0.5, var = 1, exact = FALSE)
each(ex)
block_by(block_var, levels = c("treatment", "control"), show_block = FALSE)
random_levels(n, k = NULL, replace = FALSE)
Arguments
- n
The symbol standing for the number of rows in the data frame to be generated by
datasim_run()
. Just usen
as a symbol; don't assign it a value. (That will be done bydatasim_run()
.)- exact
if
TRUE
, make R-squared or the target variance exactly as specified.- variable
a categorical variable
- logodds
Numerical vector used to generate bernouilli trials. Can be any real number.
- prob
An alternative to
logodds
. Values must be in[0,1]
.- labels
Character vector: names for categorical levels, also used to replace 0 and 1 in bernouilli()
- signal
The part of the mixture that will be correlated with the output.
- noise
The rest of the mixture. This will be uncorrelated with the output only if you specify it as pure noise.
- R2
The target R-squared.
- var
The target variance.
- ex
an expression potentially involving other variables.
- block_var
Which variable to use for blocking
- levels
Character vector giving names to the blocking levels
- show_block
Logical. If
TRUE
, put the block number in the output.- k
Number of distinct levels
- replace
if
TRUE
, use resampling on the set of k levels- ...
assignments of values to the names in
variable
Value
A numerical or categorical vector which will be assembled into
a data frame by datasim_run()
Details
datasim_make()
constructs a simulation
which can then be run with datasim_run()
. Each argument to
datasim_make()
specifies one node of the simulation using an
assignment-like syntax such as y <- 3*x + 2 + rnorm(n)
. The datasim
helpers documented here are for use on the right-hand side of the specification
of a node. They simplify potentially complex operations such as blocking, creation
of random categorical methods, translation from categorical to numerical values, etc.
The target R-squared and variance will be achieved only
if exact=TRUE
or the sample size goes to infinity.
Examples
Demo <- datasim_make(
g <- categorical(n, a=2, b=1, c=0.5),
x <- cat2value(g, a=-1.7, b=0.1, c=1.2),
y <- bernoulli(logodds = x, labels=c("no", "yes")),
z <- random_levels(n, k=4),
w <- mix_with(x, noise=rnorm(n), R2=0.75, var=1),
treatment <- block_by(w),
dice <- each(rnorm(1, sd = abs(w)))
)