Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Implementing Sampling to Python | Probability & Statistics
Mathematics for Data Science

bookImplementing Sampling to Python

Sampling is a core concept in statistics and data science, helping us analyze populations without observing every individual.
In this tutorial, you'll see how to implement four different sampling methods using Python: Simple Random Sampling, Stratified Sampling, Cluster Sampling, and Systematic Sampling.

Simple Random Sampling

1234567
import random N = 30 # population size n = 5 # sample size sample_srs = random.sample(range(1, N+1), n) print(f"Simple Random Sample: {sample_srs}")
copy
  • random.sample(range(1, N+1), n) randomly selects n unique values from the population;
  • Works without replacement (no repeats);
  • Every member of the population has an equal chance of being chosen.

Stratified Sampling

123456789
N_males = 18 N_females = 12 N_total = N_males + N_females n_total = 10 n_males = round((N_males / N_total) * n_total) n_females = round((N_females / N_total) * n_total) print(f"Stratified Sample Size -> Males: {n_males}, Females: {n_females}")
copy
  • Population is divided into subgroups (strata);
  • Sample is drawn proportionally from each subgroup;
  • Ensures representation of key groups.

Cluster Sampling

1234567
import random clusters = 5 students_per_cluster = 25 selected_cluster = random.randint(1, clusters) print(f"Selected cluster (classroom): {selected_cluster} containing {students_per_cluster} students")
copy
  • Population divided into clusters (e.g., classrooms);
  • One or more clusters are selected randomly;
  • Everyone in chosen cluster(s) is surveyed;
  • Efficient when listing every individual is impractical.

Systematic Sampling

123456789101112
import random N = 1000 n = 100 k = N // n # Sampling interval start = random.randint(1, k) # Random start sample_systematic = list(range(start, N+1, k)) print(f"Sampling interval k = {k}") print(f"Random start = {start}") print(f"First 10 samples: {sample_systematic[:10]}")
copy
  • Interval k=Nnk = \frac{N}{n};
  • Start point chosen randomly between 1 and kk;
  • Select every kk-th element from ordered population.

Summary of Methods

  • Simple Random: equal chance for all, no repeats;
  • Stratified: ensures subgroup representation;
  • Cluster: randomly selects whole groups;
  • Systematic: selects at fixed intervals after random start.

1.

2.


3. Which sampling method requires subgroup proportions? A) Cluster sampling B) Systematic sampling C) Stratified sampling ✅ D) Simple random sampling


4.


5. If N = 1000 and n = 100, what is k? A) 5 B) 10 ✅ C) 100 D) 1


6.

7.


8. What is a drawback of cluster sampling? A) Too random B) Time-consuming C) Lack of subgroup representation ✅ D) Expensive to implement


9. Which sampling method is best when you can't list the full population? A) Stratified sampling B) Systematic sampling C) Cluster sampling ✅ D) Simple random sampling


10. What is printed by sample_systematic[:10]? A) First 10 clusters B) Random numbers C) First 10 sample indices ✅ D) Entire population


11.

1. What function is used for simple random sampling without replacement?

2. In random.sample(range(1, N+1), n), what does range(1, N+1) represent?

3. What does the variable k represent in systematic sampling?

4. Why is random.randint(1, k) used in systematic sampling?

5. In stratified sampling, what does round((N_group / N_total) * n_total) calculate?

6. What Python function selects one random cluster?

question mark

What function is used for simple random sampling without replacement?

Select the correct answer

question mark

In random.sample(range(1, N+1), n), what does range(1, N+1) represent?

Select the correct answer

question mark

What does the variable k represent in systematic sampling?

Select the correct answer

question mark

Why is random.randint(1, k) used in systematic sampling?

Select the correct answer

question mark

In stratified sampling, what does round((N_group / N_total) * n_total) calculate?

Select the correct answer

question mark

What Python function selects one random cluster?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 5. Capítulo 6

Pregunte a AI

expand

Pregunte a AI

ChatGPT

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Suggested prompts:

Can you explain the differences between these four sampling methods?

When should I use each sampling method in practice?

Can you provide more real-world examples for each sampling method?

Awesome!

Completion rate improved to 1.89

bookImplementing Sampling to Python

Desliza para mostrar el menú

Sampling is a core concept in statistics and data science, helping us analyze populations without observing every individual.
In this tutorial, you'll see how to implement four different sampling methods using Python: Simple Random Sampling, Stratified Sampling, Cluster Sampling, and Systematic Sampling.

Simple Random Sampling

1234567
import random N = 30 # population size n = 5 # sample size sample_srs = random.sample(range(1, N+1), n) print(f"Simple Random Sample: {sample_srs}")
copy
  • random.sample(range(1, N+1), n) randomly selects n unique values from the population;
  • Works without replacement (no repeats);
  • Every member of the population has an equal chance of being chosen.

Stratified Sampling

123456789
N_males = 18 N_females = 12 N_total = N_males + N_females n_total = 10 n_males = round((N_males / N_total) * n_total) n_females = round((N_females / N_total) * n_total) print(f"Stratified Sample Size -> Males: {n_males}, Females: {n_females}")
copy
  • Population is divided into subgroups (strata);
  • Sample is drawn proportionally from each subgroup;
  • Ensures representation of key groups.

Cluster Sampling

1234567
import random clusters = 5 students_per_cluster = 25 selected_cluster = random.randint(1, clusters) print(f"Selected cluster (classroom): {selected_cluster} containing {students_per_cluster} students")
copy
  • Population divided into clusters (e.g., classrooms);
  • One or more clusters are selected randomly;
  • Everyone in chosen cluster(s) is surveyed;
  • Efficient when listing every individual is impractical.

Systematic Sampling

123456789101112
import random N = 1000 n = 100 k = N // n # Sampling interval start = random.randint(1, k) # Random start sample_systematic = list(range(start, N+1, k)) print(f"Sampling interval k = {k}") print(f"Random start = {start}") print(f"First 10 samples: {sample_systematic[:10]}")
copy
  • Interval k=Nnk = \frac{N}{n};
  • Start point chosen randomly between 1 and kk;
  • Select every kk-th element from ordered population.

Summary of Methods

  • Simple Random: equal chance for all, no repeats;
  • Stratified: ensures subgroup representation;
  • Cluster: randomly selects whole groups;
  • Systematic: selects at fixed intervals after random start.

1.

2.


3. Which sampling method requires subgroup proportions? A) Cluster sampling B) Systematic sampling C) Stratified sampling ✅ D) Simple random sampling


4.


5. If N = 1000 and n = 100, what is k? A) 5 B) 10 ✅ C) 100 D) 1


6.

7.


8. What is a drawback of cluster sampling? A) Too random B) Time-consuming C) Lack of subgroup representation ✅ D) Expensive to implement


9. Which sampling method is best when you can't list the full population? A) Stratified sampling B) Systematic sampling C) Cluster sampling ✅ D) Simple random sampling


10. What is printed by sample_systematic[:10]? A) First 10 clusters B) Random numbers C) First 10 sample indices ✅ D) Entire population


11.

1. What function is used for simple random sampling without replacement?

2. In random.sample(range(1, N+1), n), what does range(1, N+1) represent?

3. What does the variable k represent in systematic sampling?

4. Why is random.randint(1, k) used in systematic sampling?

5. In stratified sampling, what does round((N_group / N_total) * n_total) calculate?

6. What Python function selects one random cluster?

question mark

What function is used for simple random sampling without replacement?

Select the correct answer

question mark

In random.sample(range(1, N+1), n), what does range(1, N+1) represent?

Select the correct answer

question mark

What does the variable k represent in systematic sampling?

Select the correct answer

question mark

Why is random.randint(1, k) used in systematic sampling?

Select the correct answer

question mark

In stratified sampling, what does round((N_group / N_total) * n_total) calculate?

Select the correct answer

question mark

What Python function selects one random cluster?

Select the correct answer

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 5. Capítulo 6
some-alt