We were visited by the census today. I've already filled in the online census, but this was a cross-check. The idea is to see if the door-to-door came out with the same distribution as the online census.
Data collection isn't as easy as you might think. Anyone who has worked with published statistics, knows that they are imperfect in various way; some random, some systematic.
The standard way that I used to teach this, was to get people to extract population statistics for Germany, 1950 to 1980. When you do that, you might notice that there's a sudden jump in one year (in the sixties), and the population is two million more than you'd expect.
This isn't because Germans suddenly got jiggy.
It's because from that year on, they started to include the population of Berlin in the population of Germany. If you check the footnotes and appendices carefully, you'll probably discover this reason.
But if you just use that stream of numbers for time series analysis, you'll bump into problems.