Minor update to docs

2020-02-03 15:59:10 -08:00 · 2020-02-03 15:59:10 -08:00 · 7f5b140bb2
--- a/README.md
+++ b/README.md
@ -48,7 +48,7 @@ The Fully Separate solution is fast and often quite good so we recommend startin
 ### Model types
 There are two main model-types (corresponding to different cuts of the data) that can be used to estimate treatment effects.
 1. Retrospective: The goal is to minimize squared prediction error of the control units on `Y_post` and the full-pre history of the outcome is used as features in fitting. This is the default and was used in the descriptive elements above.
-2. Prospective: We make an artificial split in time before any treatment actually happens (`Y_pre=[Y_train,Y_test]$`). The goal is to minimize squared prediction error of all units on `Y_test` and `Y_train` for all units is used as features in fitting.
+2. Prospective: We make an artificial split in time before any treatment actually happens (`Y_pre=[Y_train,Y_test]`). The goal is to minimize squared prediction error of all units on `Y_test` and `Y_train` for all units is used as features in fitting.

 Given the same amount of features, the two will only differ when there are a non-trivial number of treated units. In this case the prospective model may provide lower prediction error for the treated units, though at the cost of less pre-history data used for fitting. When there are a trivial number of units, the retrospective design will be the most efficient.

--- a/SparseSC.sln
+++ b/SparseSC.sln
@ -1,18 +1,23 @@

 Microsoft Visual Studio Solution File, Format Version 12.00
-# Visual Studio 15
-VisualStudioVersion = 15.0.28010.2016
+# Visual Studio Version 16
+VisualStudioVersion = 16.0.29613.14
 MinimumVisualStudioVersion = 10.0.40219.1
 Project("{888888A0-9F3D-457C-B088-3A5042F75D52}") = "SparseSC", "src\SparseSC\SparseSC.pyproj", "{3FDC664A-C1C8-47F0-8D77-8E0679E53C82}"
 EndProject
 Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Misc", "Misc", "{AA17F266-A75A-4FDE-A774-168EADE9A32E}"
 	ProjectSection(SolutionItems) = preProject
+		docs\azure_batch.md = docs\azure_batch.md
 		CHANGELOG.md = CHANGELOG.md
 		docs\dev_notes.md = docs\dev_notes.md
+		docs\estimate-effects.md = docs\estimate-effects.md
 		example-code.py = example-code.py
 		examples\example_graphs.py = examples\example_graphs.py
 		docs\fit.md = docs\fit.md
 		examples\fit_poc.py = examples\fit_poc.py
+		docs\model-types.md = docs\model-types.md
+		docs\overview.md = docs\overview.md
+		docs\performance-notes.md = docs\performance-notes.md
 		README.md = README.md
 	EndProjectSection
 EndProject
--- a/docs/overview.md
+++ b/docs/overview.md
@ -14,7 +14,7 @@ One way to think of SC is as an improvement upon difference-in-difference (DiD)
 The authors show if endogeneity of treatment is driven by a factor model with vectors components `$f_t\cdot\lambda_i$` where `$\lambda_i$` might be correlated with treatment (a simple example would be that there are groups with different typical time trends) and the synthetic control is able to reproduce the treated unit's pre-treatment history, then as the pre-history grows the size of the expected bias of the estimated treatment effect approaches zero (NB: this is not quite consistency). Essentially, if there are endogenous factors that affect treatment and future outcomes then you should be able to control for them by matching on past outcomes. The matching that SC provides can therefore deal with some problems in estimation that DiD can not handle.

 ### SparseSC Solution Structure
-Given a specific set of variable weights, `$V$` and control variable matrix of data to match on `$M^C$`, unit-weights for unit `$i$` is `$W_i=\arg\min_{W}\sum_k(M_{ik}-W\cdotM^C_{\cdot,k})^2\cdotv_{kk}$`. Synthetic Controls typically also restricts the weight vector to be non-negative and sum to one. These restrictions may aid interpretability, though they are not econometrically necessary and may harm performance (e.g. they make it difficult to model units on the convex hull of the matching-space).
+Given a specific set of variable weights, `$V$` and control variable matrix of data to match on `$M^C$`, unit-weights for unit `$i$` is `$W_i=\arg\min_{W}\sum_k(M_{ik}-W\cdot M^C_{\cdot,k})^2\cdot v_{kk}$`. Synthetic Controls typically also restricts the weight vector to be non-negative and sum to one. These restrictions may aid interpretability, though they are not econometrically necessary and may harm performance (e.g. they make it difficult to model units on the convex hull of the matching-space).

 The determination of `$V$` can be done jointly or separately.
 * Jointly: We find `$V$` such that resulting SCs have outcomes that are 'optimal' in some sense. They originally minimized squared prediction error on `$Y_{pre}$`. In the standard Stata `synth` command this is the `nested` option.