Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda Scatter Plots for Sequence Properties | Biological Data Visualization
Python for Biologists

bookScatter Plots for Sequence Properties

Scatter plots are powerful tools for visualizing the relationship between two numerical variables. In biology, scatter plots are often used to compare properties of biological sequences, such as the length of DNA sequences and their GC content. By plotting these properties against each other, you can quickly see if there is any correlation or trend, such as whether longer sequences tend to have higher or lower GC content. This kind of visualization helps you identify patterns, outliers, or clusters that may be biologically meaningful, such as differences between genes, chromosomes, or species.

123456789101112131415161718192021222324252627282930
import matplotlib.pyplot as plt # Example DNA sequences from different species sequences = [ {"id": "seq1", "species": "Human", "sequence": "ATGCGCGTACGTAGCTAGCGT"}, {"id": "seq2", "species": "Mouse", "sequence": "ATGCGTACGTAGCTAGC"}, {"id": "seq3", "species": "Yeast", "sequence": "ATGCGCGCGCGT"}, {"id": "seq4", "species": "Human", "sequence": "ATGCGTAGCTAGCTAGCGCGT"}, {"id": "seq5", "species": "Mouse", "sequence": "ATGCTAGCTAG"}, {"id": "seq6", "species": "Yeast", "sequence": "ATGCGCGCGCGCGCGT"}, ] # Calculate sequence length and GC content def gc_content(seq): g = seq.count("G") c = seq.count("C") return 100 * (g + c) / len(seq) lengths = [] gc_contents = [] for entry in sequences: seq = entry["sequence"] lengths.append(len(seq)) gc_contents.append(gc_content(seq)) plt.scatter(lengths, gc_contents) plt.xlabel("Sequence Length") plt.ylabel("GC Content (%)") plt.title("Scatter Plot of Sequence Length vs. GC Content") plt.show()
copy

To understand the scatter plot code, start by preparing a list of DNA sequences, each with a species label and a sequence string. You calculate the length of each sequence and its GC content using a simple function that counts the number of G and C nucleotides and divides by the total length. These values are stored in two lists: one for sequence lengths and one for GC content percentages. The plt.scatter() function from matplotlib creates a scatter plot, with sequence length on the x-axis and GC content on the y-axis. You label the axes with plt.xlabel() and plt.ylabel(), and add a title with plt.title(). Interpreting the plot, each point represents a sequence; clusters or trends can reveal biological insights, such as whether certain species have sequences with higher GC content or if longer sequences tend to have specific GC content ranges.

1234567891011121314151617181920
import matplotlib.pyplot as plt # Assign a color to each species species_colors = {"Human": "blue", "Mouse": "green", "Yeast": "red"} lengths = [] gc_contents = [] colors = [] for entry in sequences: seq = entry["sequence"] lengths.append(len(seq)) gc_contents.append(gc_content(seq)) colors.append(species_colors[entry["species"]]) plt.scatter(lengths, gc_contents, c=colors) plt.xlabel("Sequence Length") plt.ylabel("GC Content (%)") plt.title("Sequence Length vs. GC Content (Colored by Species)") plt.show()
copy

1. What can a scatter plot of sequence length vs. GC content reveal?

2. Fill in the blank: In a scatter plot, each point represents a _____ with two properties.

3. How can color coding enhance the interpretability of scatter plots in biology?

question mark

What can a scatter plot of sequence length vs. GC content reveal?

Select the correct answer

question-icon

Fill in the blank: In a scatter plot, each point represents a _____ with two properties.

with two properties.
(no output, as this is a fill-in-the-blank conceptual question)
question mark

How can color coding enhance the interpretability of scatter plots in biology?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 6

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

Suggested prompts:

Can you explain how coloring by species helps interpret the scatter plot?

What other features could I add to the scatter plot for more insights?

How can I compare the GC content distributions between species?

bookScatter Plots for Sequence Properties

Deslize para mostrar o menu

Scatter plots are powerful tools for visualizing the relationship between two numerical variables. In biology, scatter plots are often used to compare properties of biological sequences, such as the length of DNA sequences and their GC content. By plotting these properties against each other, you can quickly see if there is any correlation or trend, such as whether longer sequences tend to have higher or lower GC content. This kind of visualization helps you identify patterns, outliers, or clusters that may be biologically meaningful, such as differences between genes, chromosomes, or species.

123456789101112131415161718192021222324252627282930
import matplotlib.pyplot as plt # Example DNA sequences from different species sequences = [ {"id": "seq1", "species": "Human", "sequence": "ATGCGCGTACGTAGCTAGCGT"}, {"id": "seq2", "species": "Mouse", "sequence": "ATGCGTACGTAGCTAGC"}, {"id": "seq3", "species": "Yeast", "sequence": "ATGCGCGCGCGT"}, {"id": "seq4", "species": "Human", "sequence": "ATGCGTAGCTAGCTAGCGCGT"}, {"id": "seq5", "species": "Mouse", "sequence": "ATGCTAGCTAG"}, {"id": "seq6", "species": "Yeast", "sequence": "ATGCGCGCGCGCGCGT"}, ] # Calculate sequence length and GC content def gc_content(seq): g = seq.count("G") c = seq.count("C") return 100 * (g + c) / len(seq) lengths = [] gc_contents = [] for entry in sequences: seq = entry["sequence"] lengths.append(len(seq)) gc_contents.append(gc_content(seq)) plt.scatter(lengths, gc_contents) plt.xlabel("Sequence Length") plt.ylabel("GC Content (%)") plt.title("Scatter Plot of Sequence Length vs. GC Content") plt.show()
copy

To understand the scatter plot code, start by preparing a list of DNA sequences, each with a species label and a sequence string. You calculate the length of each sequence and its GC content using a simple function that counts the number of G and C nucleotides and divides by the total length. These values are stored in two lists: one for sequence lengths and one for GC content percentages. The plt.scatter() function from matplotlib creates a scatter plot, with sequence length on the x-axis and GC content on the y-axis. You label the axes with plt.xlabel() and plt.ylabel(), and add a title with plt.title(). Interpreting the plot, each point represents a sequence; clusters or trends can reveal biological insights, such as whether certain species have sequences with higher GC content or if longer sequences tend to have specific GC content ranges.

1234567891011121314151617181920
import matplotlib.pyplot as plt # Assign a color to each species species_colors = {"Human": "blue", "Mouse": "green", "Yeast": "red"} lengths = [] gc_contents = [] colors = [] for entry in sequences: seq = entry["sequence"] lengths.append(len(seq)) gc_contents.append(gc_content(seq)) colors.append(species_colors[entry["species"]]) plt.scatter(lengths, gc_contents, c=colors) plt.xlabel("Sequence Length") plt.ylabel("GC Content (%)") plt.title("Sequence Length vs. GC Content (Colored by Species)") plt.show()
copy

1. What can a scatter plot of sequence length vs. GC content reveal?

2. Fill in the blank: In a scatter plot, each point represents a _____ with two properties.

3. How can color coding enhance the interpretability of scatter plots in biology?

question mark

What can a scatter plot of sequence length vs. GC content reveal?

Select the correct answer

question-icon

Fill in the blank: In a scatter plot, each point represents a _____ with two properties.

with two properties.
(no output, as this is a fill-in-the-blank conceptual question)
question mark

How can color coding enhance the interpretability of scatter plots in biology?

Select the correct answer

Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 3. Capítulo 6
some-alt