Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Oppiskele Parsing and Storing Blockchain Data | Blockchain Data Analysis
Blockchain Foundations with Python

bookParsing and Storing Blockchain Data

Pyyhkäise näyttääksesi valikon

When working with blockchain data, you will often encounter it in formats such as JSON and CSV. JSON (JavaScript Object Notation) is a widely used format for storing and exchanging structured data. Blockchain nodes and APIs typically return data in JSON, which is both human-readable and easy to parse using Python. CSV (Comma-Separated Values) is another common format, especially for tabular data and for use in data analysis tools like Excel or pandas. To analyze blockchain transactions, you need to parse the raw JSON data into a structured format and, optionally, convert it to CSV for easier downstream analysis.

Parsing strategies depend on the data's structure and your analysis goals. For blockchain transaction data, you should identify relevant fields (such as transaction hash, sender, recipient, value, and timestamp) and extract them into a consistent schema. This makes it easier to filter, aggregate, and analyze the data later.

import json

# Sample JSON data representing blockchain transactions
raw_data = '''
[
    {
        "hash": "0xabc123",
        "from": "0xsender1",
        "to": "0xrecipient1",
        "value": "1000000000000000000",
        "timestamp": 1680000000
    },
    {
        "hash": "0xdef456",
        "from": "0xsender2",
        "to": "0xrecipient2",
        "value": "2500000000000000000",
        "timestamp": 1680000100
    }
]
'''

# Parse the JSON string into Python objects
transactions = json.loads(raw_data)

# Convert to a structured format (list of dicts)
structured_data = []
for tx in transactions:
    structured_data.append({
        "hash": tx["hash"],
        "from": tx["from"],
        "to": tx["to"],
        "value_eth": int(tx["value"]) / 10**18,  # Convert Wei to Ether
        "timestamp": tx["timestamp"]
    })

print(structured_data)

After parsing, the data often needs cleaning and transformation before analysis. This may include removing duplicates, handling missing values, converting data types, and normalizing fields such as converting Wei to Ether.

These steps ensure the dataset is consistent and ready for analysis, allowing you to work more effectively with the structured data collected earlier.

import csv

# Assume structured_data is the cleaned and transformed list of dicts from earlier
fieldnames = ["hash", "from", "to", "value_eth", "timestamp"]

with open("transactions.csv", "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for tx in structured_data:
        writer.writerow(tx)

print("Data saved to transactions.csv")
question mark

What are the benefits of cleaning blockchain data before analysis, and why might you choose to store parsed data in CSV format instead of JSON?

Select the correct answer

Oliko kaikki selvää?

Miten voimme parantaa sitä?

Kiitos palautteestasi!

Osio 3. Luku 2

Kysy tekoälyä

expand

Kysy tekoälyä

ChatGPT

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 3. Luku 2
some-alt