crossvalidate/fitit.sthlp at main · wbuchanan/crossvalidate · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
{smcl}
{* *! version 0.0.8 22mar2024}{...}
{vieweralsosee "[R] estat classification" "mansection R estat_classification"}{...}
{vieweralsosee "" "--"}{...}
{viewerjumpto "Syntax" "fitit##syntax"}{...}
{viewerjumpto "Description" "fitit##description"}{...}
{viewerjumpto "Options" "fitit##options"}{...}
{viewerjumpto "Examples" "fitit##examples"}{...}
{viewerjumpto "Returned Values" "fitit##retvals"}{...}
{viewerjumpto "Additional Information" "fitit##additional"}{...}
{viewerjumpto "Contact" "fitit##contact"}{...}
{title:Model Fitting for Cross-Validation in Stata}

{marker syntax}{...}
{title:Syntax}

{p 8 32 2}
{cmd:fitit} {it:"estimation command"} {cmd:,} {cmdab:spl:it(}{it:passthru}{cmd:)}
{cmdab:res:ults(}{it:string asis}{cmd:)} [ {cmdab:kf:old(}{it:integer}{cmd:)}
{cmd:noall} {cmdab:dis:play} {cmdab:na:me(}{it:string asis}{cmd:)}]{p_end}

{synoptset 25 tabbed}{...}
{synoptline}
{synopthdr}
{synoptline}
{syntab:Required}
{synopt :{opt spl:it}}name of the variable that identifies the training split(s){p_end}
{synopt :{opt res:ults}}a stub for storing estimation results{p_end}
{syntab:Optional}
{synopt :{opt kf:old}}specifies the number of folds in the training set; default is {cmd:kfold(1)}.{p_end}
{synopt :{opt noall}}suppresses prediction on entire training set for K-Fold cases{p_end}
{synopt :{opt dis:play}}display results in window; default is {cmd:off}{p_end}
{synopt :{opt na:me}}is used to name the collection storing the results; default is {cmd:name(xvfit)}.{p_end}
{synoptline}


{marker description}{...}
{title:Description}

INCLUDE help xvphase-fit

{marker options}{...}
{title:Options}

{dlgtab:Required}

{phang}
{opt spl:it} must contain the name of the variable that stores the test,
validation, and test splits.  There will only be a single variable if the splits
were created using {help splitit}.

{phang}
{opt res:ults} is used to {help estimates_store:estimates store} the estimation
results from each of the {opt k:fold} folds in the dataset.  When used with
K-Fold cross-validation, the estimation results returned by {help ereturn} will
be based on fitting the model to the entire training set.  Results from each of
the training folds can be easily recovered using the appropriate reference
passed to the {opt res:ults} option.  In this case, you will need to add the
fold number as a suffix to the name you pass to the {opt res:ults} option to
recover the estimation results for that fold.  The fold number identifies the
held-out fold.  So, the number 1 will recover the model that was fitted to all
of the training folds except number 1.

{dlgtab:Optional}

{phang}
{opt kf:old} defines the number of K-Folds used for the training set.  In other
places, we reference using K-Fold cross-validation in the more common form,
where the training set consists of multiple subsets of data.  However, standard
train/test and train/validation/test splits are simply a special case of K-Fold
cross-validation where there is only a single fold.

{phang}
{opt no:all} is an option to prevent predicting the outcome for a model fitted
to the entire training set when using K-Fold cross-validation.  If this option
is used, {opt kfi:fin} will have no effect since the relevant predictions will
not be generated.

{phang}
{opt dis:play} an option to display the model fitting results in the result
window.  If using a large number of K-Folds, it may be useful to not print all
of the model fitting results to the result window.

{phang}
{opt na:me} is an option to pass a name to the collection created to store the
results.  When {cmd fitit} is executed, it will initialize a new collection
or replace the existing collection with the same name.  If you want to retain
the validation results from multiple executions, pass an argument to this
option.  {it:Note:} this only affects users using Stata 17 or later.

{marker examples}{...}
{title:Examples}

{p 4 4 2}Load example data{p_end}
{p 8 4 2}{stata sysuse auto.dta, clear}{p_end}
{p 4 4 2}Expand the data to create identical K-Folds{p_end}
{p 8 4 2}{stata expand 6}{p_end}
{p 4 4 2}Create a "split" identifier{p_end}
{p 8 4 2}{stata "bys make: g byte spvar = _n"}{p_end}
{p 4 4 2}Fit a model to each of the K-Folds and all of the training set{p_end}
{p 8 4 2}{stata fitit "reg price mpg headroom", spl(spvar) res(tst) kf(5)}{p_end}
{p 4 4 2}Fit the model only the the individual K-Folds{p_end}
{p 8 4 2}{stata fitit "reg price mpg headroom", spl(spvar) res(tst) kf(5) noall}{p_end}


{marker retvals}{...}
{title:Returned Values}
{p 4 4 8}The following lists the names of the e-macros and their contents.{p_end}

{synoptset 25 tabbed}{...}
{synoptline}
{synopthdr:Name}
{synoptline}
{synopt :{cmd:e(estres#)}}the name to store the estimation results on the #th fold.{p_end}
{synopt :{cmd:e(estresnames)}}the names of all the estimation results{p_end}
{synopt :{cmd:e(estresall)}}the name used to store the estimation results for the entire training set when K-Fold cross-validation is used.{p_end}
{synopt :{cmd:e(predifin)}}the if expression to use when predicting on validation/test split.{p_end}
{synopt :{cmd:e(kfpredifin)}}the if expression to use when predicting on the K-Fold hold out set.{p_end}
{synoptline}

{p 4 4 8}{cmd:fitit} also reposts the e-return values from the model fitted to the entire training set.  As a reminder, when used with {opt k:fold} > 1, the results returned by {help ereturn} will come from fitting the model to the entire training set (e.g., all of the K-Folds simultaneously).  The results from individual folds can be recovered by appending the held-out fold number to the value passed to {opt res:ults} and calling {help estimates_restore:estimates restore}.{p_end}

{marker additional}{...}
{title:Additional Information}
{pstd}
The {cmdab dis:play} option in validateit is enabled by
{help collect_preview:collect preview}.  In addition to providing a convient way
for us to structure the display in a useful way, it also makes it easy for you -
the user - to export the validation results into any of several formats.  The
results from {help fitit} are all stored in the collection named
{cmd:xvfit}.  For more information on how to export these results into the
format of your choosing, please see {help collect_export:collect export}.

{pstd}
If you have questions, comments, or find bugs, please submit an issue in the
{browse "https://github.com/wbuchanan/crossvalidate":crossvalidate GitHub repository}.


{marker contact}{...}
{title:Contact}
{p 4 4 8}William R. Buchanan, Ph.D.{p_end}
{p 4 4 8}Sr. Research Scientist, SAG Corporation{p_end}
{p 4 4 8}{browse "https://www.sagcorp.com":SAG Corporation}{p_end}
{p 4 4 8}wbuchanan at sagcorp [dot] com{p_end}

{p 4 4 8}Steven D. Brownell, Ph.D.{p_end}
{p 4 4 8}Economist, SAG Corporation{p_end}
{p 4 4 8}{browse "https://www.sagcorp.com":SAG Corporation}{p_end}
{p 4 4 8}sbrownell at sagcorp [dot] com{p_end}