What really is data science

  • 2 months ago

Before continuing a little about GritinAI.

GritinAI is a startup that brings the light of Artificial Intelligence in all fields through training's, building products and mentoring.


Some possible definitions of Data Science

Data science is the application of computational and statistical techniques to address or gain insight into some problem in the real world.

Data science = statistics + data processing + machine learning + scientific inquiry + visualisation + business analytics + big data + …

What is Data Science not(just)?

  1. Data science is not(just) machine learning

Machine learning involves computation and statistics, but has not (traditionally) been very concerned about answering scientific questions

Machine learning has a heavy focus on fancy algorithms…

… but sometimes the best way to solve a problem is just by visualising the data.

for instance.

Data science competitions like Kaggle ask you to optimize a metric on a fixed data set. This may or may not ultimately solve the desired business/scientific problem

Data science is the iterative cycle of designing a concrete problem, building an algorithm to solve it (or determining that this is not possible), and evaluating what insights this provides for the real underlying question

2. Data science is not(just) machine learning

“Analyzing data computationally, to understand some phenomenon in the real world, you say? … that sounds an awful lot like statistics”

Statistics (at least the academic type) has evolved a lot more along the mathematical/theoretical frontier

Not many statistics courses have a lecture on e.g. web scraping, or a lot of data processing more generally

Plus, statisticians use R, while data scientists use Python … clearly these are completely different fields

3. Data science is not(just) big data

Sometimes, in order to truly understand and answer your question, you need massive amounts of data…

…But sometimes you don’t

Don’t create more work for yourself than you need to

(A few) data science examples

  1. Gendered Language in Teacher Reviews(http://benschmidt.org/profGender/)
  2. Forecasting the race of a house(https://projects.fivethirtyeight.com/2018-midterm-election-forecast/house/)
  3. Poverty mapping

Learning objectives of any data science course

After taking a data science course, you should…
… understand the full data science pipeline, and be familiar with programming tools to accomplish the different portions
… be able to collect data from unstructured sources and store it using appropriate structure such as relational databases, graphs, matrices, etc
… know to explore and visualize your data
… be able to analyze your data rigorously using a variety of statistical and machine learning approaches

Recommended background before embarking on any data science course

The only formal prerequisite for a data science course is an intro to programming (if you have taken one at another university, this is fine)
Its strongly recommend that students have experience with Python, ideally
some background in probability and statistics, and linear algebra
If you don’t have background in these areas, you may still sign up, but be aware that you will probably need to learn some of these items as the class goes on (you could hit us up to providing pointers)
General rule of thumb: If the homework seems hard, but you have ideas about how to proceed, you probably have the right level of background; if the homework seems hard and you have no idea how to proceed, this may be the wrong course.