- in general in academia: often, it seems to people outside of some academic specialty as if the people in that specialty have something all figured out and are pretty sure about their solution. but often, when that is the case, if you ask someone who is actually working on that particular question, they may say, 'yeah, that's our best guess, but the evidence isn't really as solid as it seems, because <list of 5 technical reasons>'
statistics
- base rates: let's pretend that 1 in 1,000,000 people have some disease (base rate). You have a test for the disease, but it has a 1% false positive rate. That means that if you give the test to someone that you know for sure doesn't have the disease, then there's a 1% chance that the test will come back positive anyway. Now, someone tells you that they tested positive for the disease. What's the probability that they have the disease? You may think it's 1%, but it's not. If you gave the test to 100,000 people, you would expect that 1 of them would actually have the disease, and the other 999,999 would not. But 1% of those 999,999 people would be false positives, which is almost 10,000 people. So the chance of having the disease, given a positive test, is only about 1 in 10,000 -- much less than 1%. This comes up whenever you have to test for something rare -- another example is when you screen for terrorists (even if the screening process has a low false positive rate, if terrorists are rare, then most likely most of the positives will turn out to be false positives).
automated prediction (machine learning)
- validation: when you induce (fit) a model from data, if you want to know how good the model's performance is, you have to split the data into a training set and a test set, and fit the model based on the training set, and then test them against the test set. if you just test it against the same data you trained it against, you may think that the model's performance is much better than it really is.
- prior knowledge: usually, in an automated prediction task, adding more (correct) "prior knowledge" (i.e. assumptions built into your model) has a bigger impact on the model performance than improving the learning algorithm