Helpers for specifying nodes in simulations

Mix two variables together. The output will have the specified R-squared with var1 and variance one.

Evaluate an expression separately for each case

## Usage

categorical(n = 5, ..., exact = TRUE)

cat2value(variable, ...)

bernoulli(n = 0, logodds = NULL, prob = 0.5, labels = NULL)

mix_with(signal, noise = NULL, R2 = 0.5, var = 1, exact = FALSE)

each(ex)

block_by(block_var, levels = c("treatment", "control"), show_block = FALSE)

random_levels(n, k = NULL, replace = FALSE)

## Arguments

n

The symbol standing for the number of rows in the data frame to be generated by datasim_run(). Just use n as a symbol; don't assign it a value. (That will be done by datasim_run().)

exact

if TRUE, make R-squared or the target variance exactly as specified.

variable

a categorical variable

logodds

Numerical vector used to generate bernouilli trials. Can be any real number.

prob

An alternative to logodds. Values must be in [0,1].

labels

Character vector: names for categorical levels, also used to replace 0 and 1 in bernouilli()

signal

The part of the mixture that will be correlated with the output.

noise

The rest of the mixture. This will be uncorrelated with the output only if you specify it as pure noise.

R2

The target R-squared.

var

The target variance.

ex

an expression potentially involving other variables.

block_var

Which variable to use for blocking

levels

Character vector giving names to the blocking levels

show_block

Logical. If TRUE, put the block number in the output.

k

Number of distinct levels

replace

if TRUE, use resampling on the set of k levels

...

assignments of values to the names in variable

## Value

A numerical or categorical vector which will be assembled into a data frame by datasim_run()

## Details

datasim_make() constructs a simulation which can then be run with datasim_run(). Each argument to datasim_make() specifies one node of the simulation using an assignment-like syntax such as y <- 3*x + 2 + rnorm(n). The datasim helpers documented here are for use on the right-hand side of the specification of a node. They simplify potentially complex operations such as blocking, creation of random categorical methods, translation from categorical to numerical values, etc.

The target R-squared and variance will be achieved only if exact=TRUE or the sample size goes to infinity.

## Examples

Demo <- datasim_make(
g <- categorical(n, a=2, b=1, c=0.5),
x <- cat2value(g, a=-1.7, b=0.1, c=1.2),
y <- bernoulli(logodds = x, labels=c("no", "yes")),
z <- random_levels(n, k=4),
w <- mix_with(x, noise=rnorm(n), R2=0.75, var=1),
treatment <- block_by(w),
dice <- each(rnorm(1, sd = abs(w)))
)