Oppiskele Parsing and Storing Blockchain Data

Pyyhkäise näyttääksesi valikon

When working with blockchain data, you will often encounter it in formats such as JSON and CSV. JSON (JavaScript Object Notation) is a widely used format for storing and exchanging structured data. Blockchain nodes and APIs typically return data in JSON, which is both human-readable and easy to parse using Python. CSV (Comma-Separated Values) is another common format, especially for tabular data and for use in data analysis tools like Excel or pandas. To analyze blockchain transactions, you need to parse the raw JSON data into a structured format and, optionally, convert it to CSV for easier downstream analysis.

Parsing strategies depend on the data's structure and your analysis goals. For blockchain transaction data, you should identify relevant fields (such as transaction hash, sender, recipient, value, and timestamp) and extract them into a consistent schema. This makes it easier to filter, aggregate, and analyze the data later.

import json

# Sample JSON data representing blockchain transactions
raw_data = '''
[
    {
        "hash": "0xabc123",
        "from": "0xsender1",
        "to": "0xrecipient1",
        "value": "1000000000000000000",
        "timestamp": 1680000000
    },
    {
        "hash": "0xdef456",
        "from": "0xsender2",
        "to": "0xrecipient2",
        "value": "2500000000000000000",
        "timestamp": 1680000100
    }
]
'''

# Parse the JSON string into Python objects
transactions = json.loads(raw_data)

# Convert to a structured format (list of dicts)
structured_data = []
for tx in transactions:
    structured_data.append({
        "hash": tx["hash"],
        "from": tx["from"],
        "to": tx["to"],
        "value_eth": int(tx["value"]) / 10**18,  # Convert Wei to Ether
        "timestamp": tx["timestamp"]
    })

print(structured_data)

After parsing, the data often needs cleaning and transformation before analysis. This may include removing duplicates, handling missing values, converting data types, and normalizing fields such as converting Wei to Ether.

These steps ensure the dataset is consistent and ready for analysis, allowing you to work more effectively with the structured data collected earlier.

import csv

# Assume structured_data is the cleaned and transformed list of dicts from earlier
fieldnames = ["hash", "from", "to", "value_eth", "timestamp"]

with open("transactions.csv", "w", newline="") as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for tx in structured_data:
        writer.writerow(tx)

print("Data saved to transactions.csv")

Oliko kaikki selvää?

Kiitos palautteestasi!

Osio 3. Luku 2

Kysy tekoälyä

Kysy mitä tahansa tai kokeile jotakin ehdotetuista kysymyksistä aloittaaksesi keskustelumme

Osio 3. Luku 2