
Standard statistical inferences are often carried out based on a model that is determined by a data-driven selection criterion. Such procedures, however, are both logically unsound and practically misleading.

What is the effect of model selection on coefficient estimates?

Simulate some data!
| Variable | True Coefficient |
|---|---|
| X1 | 2 |
| X2 | 1.5 |
| X3 | 0.5 |
| X4 | 0.1 |
| X5 - X15 | 0 |
What to expect?


| X | true | full | forward |
|---|---|---|---|
| X1 | 1 | 1 | 1 |
| X2 | 1 | 1 | 1 |
| X3 | 0.704 | 0.248 | 0.727 |
| X4 | 0.217 | 0.07 | 0.29 |

| X | full | forward |
|---|---|---|
| X5 | 0.049 | 0.162 |
| X6 | 0.062 | 0.187 |
| X7 | 0.054 | 0.16 |
| X8 | 0.047 | 0.171 |
| X9 | 0.043 | 0.167 |
| X10 | 0.059 | 0.16 |
| X11 | 0.05 | 0.168 |
| X12 | 0.06 | 0.17 |
| X13 | 0.071 | 0.201 |
| X14 | 0.06 | 0.17 |
| X15 | 0.05 | 0.185 |

selectiveInference| X | full | forward | forward adjusted |
|---|---|---|---|
| X5 | 0.049 | 0.162 | 0.017 |
| X6 | 0.062 | 0.187 | 0.026 |
| X7 | 0.054 | 0.16 | 0.016 |
| X8 | 0.047 | 0.171 | 0.018 |
| X9 | 0.043 | 0.167 | 0.023 |
| X10 | 0.059 | 0.16 | 0.031 |
| X11 | 0.05 | 0.168 | 0.016 |
| X12 | 0.06 | 0.17 | 0.026 |
| X13 | 0.071 | 0.201 | 0.018 |
| X14 | 0.06 | 0.17 | 0.018 |
| X15 | 0.05 | 0.185 | 0.016 |
| X | true | full | forward | forward adjusted |
|---|---|---|---|---|
| X1 | 1 | 1 | 1 | 0.868 |
| X2 | 1 | 1 | 1 | 0.922 |
| X3 | 0.704 | 0.248 | 0.727 | 0.232 |
| X4 | 0.217 | 0.07 | 0.29 | 0.048 |
