The partykit::ctree()
function only gives the best separation at each node,
i.e. one tree. This wrapper provides the following supports:
By setting
recursive = T
, all trees meeting p-val cutoff are produced and saved. Each round of recursion is done by removing the 1st splitting variable from the input data.frame and runningrunCtree()
; the recursion stops if no splitting variable is found.The info and stats of each node of each tree are collected and summarized in an excel file, which also contains ULRs to each tree.
Before running
partykit::ctree()
,rmNA()
andrmNZV()
are run to remove low-informative columns and rows to reduce computation and adjustment on association p-valsCases leading to crashes of
partykit::ctree()
are handled, e.g.Inf
and-Inf
are converted toNA
to avoid the following errors: " 'breaks' are not unique"
Arguments
- df1
data.frame; columns are variables and rows are observations
- cohort
char; name of the observation cohort as an annotation in the drawn tree
- oDir
char; output directory for the tree plot and a summary excel file;
one pdf file for each tree
each file is named as
paste0(oDir,.Platform$file.sep, cohort,'.',yName,'.',gList$counter,'.pdf')
The excel file is the content of
stats
from the @return (see below), and is named aspaste0(oDir,.Platform$file.sep,cohort,'.xlsx')
- yi
int; index of y variable
- pCut
p-val for significant association; not adjusted.
- recursive
logical;
F: only produce the best tree
T: produce all trees meeting
pCut
- getReturn
logical; if T, return a list below; no returns otherwise. it's also used to reduce the internal data transfer load if
recursive = T
.- ctrlParas
list; parameters for
partykit::ctree_control()
- naParas
list; parameters for
rmNA()
; set toNULL
to skip this step.- nzvParas
list; parameters for
rmNZV()
; set toNULL
to skip this step.- gList
a listenv list; it's for internal recursion tracking; users should ignore this argument.
Value
if getReturn
, a list of following items; none otherwise.
df
: cleaned df1; NA if df1 has only one column or < 10 rows with y values.stats
: possible values:NA
ctree()
doesn't run due to one of the following reasons:only one column in
df1
< 10 rows where y is not NA
y has low variance and is removed by
rmNZV()
no tree fitting the
pCut
is found. In this case, try increasingpCut
A data.frame of following columns, one tree per row
counter: the index of each tree drawing
cohort, y, pCutoff
spVar1,pVal1: the name and p-val of the splitting variable at node 1
nNode: number of nodes of the tree
spVars: a string containing the names and stats of all splitting variables. for each node, the format is "name,p-val,cut,gtOnRight" nodes are separated by ';'.
plot: the path to the tree plot