데이터과학이란?
- 과학은 지식의 새로운 지식의 습득, 기존 지식을 확장
- 1세대: 실험과학 - 실험을 통해…
- 2세대: 이론과학 - 실험결과를 설명하기 위한 이론을 정립하고 이를 바탕으로 …
- 3세대: 계산과학 - 이론에 기반한 모형을 설정하고 시뮬레이션 등을 통해 …
- 4세대 과학으로서, 데이터(Data)로 부터 정보(Information)를 추출하여 지식을 확장
- 데이터 속에 정보가 있지만, 데이터와 정보는 다르기 때문에 데이터과학이 존재
- Best Applied in the context of expert knowledge about the domain from which the data come.
통계학과 데이터과학
- 데이터과학과 같은 역할은 하는 학문은 이미 존재
- 통계학과 데이터과학의 목표는 같음
Extract Meaningful information from data
- 데이터과학과 통계학을 가르는 키워드는 Bigdata

- 빅데이터는 무엇인가? (3Vs in Bigdata)
- In the era of “bigdata”
- We have a lot more data than we had before
- We are able to compute many more things than we could before
- 통계학과 데이터과학

Big picture of Statistics

Big picture of Data Science
- Much of statistical methods were developed in an environment where data were scarce and difficult or expensive to collect, so statisticians focused on creating methods that would maximize the strength of inference, give the least of data.
- DS requires a thorough blending of computational thinking and statistical thinking
- Computational Thinking
- Abstraction (allows us, after constructing a module, to forget the details of its construction and remember a simpler description of how it will interact with other modules)
- Modularity (allows us to build components that can be re-used)
- Scalability (allows us to handle a growing amount of data or work)
- Robustness (allows us to maintain stable services even with small errors in the input stages)
- Statistical Thinking
-
Considering the real-world phenomenon behind the data
: Decision Making with uncertainty combined with domain knowledge
-
Considering the data generating procedure of the data
: Distribution of data (building block of generative models)
데이터과학자