Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprenda LightGBM | Framework Deep Dive
Advanced Tree-Based Models with Python
Seção 2. Capítulo 2
single

single

bookLightGBM

Deslize para mostrar o menu

LightGBM is a gradient boosting framework that stands out for its unique approach to tree construction and feature handling. Two of its core innovations—histogram binning and leaf-wise tree growth—are central to its reputation for high speed and efficiency, especially on large datasets.

Histogram binning

  • Discretizes continuous feature values into a fixed number of bins before training;
  • Groups feature values into these bins, reducing the number of split candidates during tree construction;
  • Speeds up computation and reduces memory usage, since raw feature data can be stored more compactly as bin indices.

Leaf-wise tree growth

  • Always splits the leaf with the maximum loss reduction, regardless of its depth;
  • Differs from traditional level-wise algorithms that grow all leaves at the same depth in parallel;
  • Also known as "best-first" or "leaf-wise" growth;
  • Can produce deeper, more complex trees that capture intricate patterns in the data;
  • Boosts accuracy, but may increase the risk of overfitting on smaller datasets—LightGBM provides parameters to control tree complexity.

Together, histogram binning and leaf-wise growth allow LightGBM to train much faster and with a lower memory footprint than many other gradient boosting frameworks, particularly when handling large, high-dimensional datasets.

12345678910111213141516171819202122232425262728293031323334353637
import time import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from lightgbm import LGBMClassifier # Generate a synthetic dataset X, y = make_classification( n_samples=20000, n_features=50, n_informative=30, n_redundant=10, n_classes=2, random_state=42 ) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # Initialize LightGBM classifier lgbm = LGBMClassifier( n_estimators=100, max_depth=8, learning_rate=0.1, subsample=0.8, colsample_bytree=0.8, random_state=42 ) # Time the training process start_time = time.time() lgbm.fit(X_train, y_train) end_time = time.time() fit_time = end_time - start_time print("LightGBM fit time (seconds):", fit_time)
copy
Note
Note

Compared to XGBoost, LightGBM's histogram-based binning and leaf-wise tree growth typically result in faster training times and lower memory consumption when using similar hyperparameters. While XGBoost uses a level-wise tree growth strategy and can be slower on large, high-dimensional datasets, LightGBM's optimizations allow it to process data more efficiently. However, the actual speed and memory advantage may depend on dataset characteristics and parameter settings.

Tarefa

Deslize para começar a programar

You are given a synthetic binary classification dataset. Your task is to:

  1. Load and split the data.
  2. Initialize a LightGBM classifier with parameters:
    • n_estimators=150.
    • learning_rate=0.05.
    • max_depth=6.
    • subsample=0.8.
    • colsample_bytree=0.8.
  3. Train the model and obtain predictions on the test set.
  4. Compute accuracy and store it in accuracy_value.
  5. Print the shapes of the datasets and the final accuracy.

Solução

Switch to desktopMude para o desktop para praticar no mundo realContinue de onde você está usando uma das opções abaixo
Tudo estava claro?

Como podemos melhorá-lo?

Obrigado pelo seu feedback!

Seção 2. Capítulo 2
single

single

Pergunte à IA

expand

Pergunte à IA

ChatGPT

Pergunte o que quiser ou experimente uma das perguntas sugeridas para iniciar nosso bate-papo

some-alt