What is Data Science?

1. What is Data Science?#

Data science is the practice of turning raw data into reliable insight to inform decisions. It blends statistics, computing, and domain expertise to collect, clean, explore, and model data. Then data scientists communicate results clearly with visualizations and narrative. Methods range from descriptive summaries to statistical inference and machine learning, with attention to uncertainty, reproducibility, and ethics. In short, data science connects questions to evidence.

1.1. What kinds of questions can it answer?#

  • Descriptive: What happened? What patterns do we see?

  • Diagnostic: Why did it happen? Which factors are associated with the outcome?

  • Predictive: What is likely to happen next? (e.g., forecasting demand)

  • Prescriptive: What should we do? (e.g., recommend actions or policies)

  • Causal: What is the effect of X on Y? (e.g., A/B tests, quasi-experiments)

  • Classification: Which category does this belong to? (spam/not spam)

  • Regression: How much? (continuous outcomes like price or growth)

  • Clustering/Segmentation: How can we group similar cases without labels?

  • Anomaly Detection: What looks unusual or risky?

  • Recommendation & Ranking: Which items should be shown first to whom?

In this book we’ll show you how to answer data science questions using Python. Why Python? Because it’s readable, reproducible, and has a rich ecosystem for data work (e.g., pandas, numpy, matplotlib, scikit-learn). In the next few chapters, we will learn just enough programming to load, clean, visualize, and model data confidently and effectively.