Server#

class fl_sim.nodes.Server(model: Module, dataset: FedDataset, config: ServerConfig, client_config: ClientConfig, lazy: bool = False)[source]#

Bases: Node, CitationMixin

The class to simulate the server node.

The server node is responsible for communicating with clients, and perform the aggregation of the local model parameters (and/or gradients), and update the global model parameters.

Parameters:
  • model (torch.nn.Module) – The model to be trained (optimized).

  • dataset (FedDataset) – The dataset to be used for training.

  • config (ServerConfig) – The configs for the server.

  • client_config (ClientConfig) – The configs for the clients.

  • lazy (bool, default False) – Whether to use lazy initialization for the client nodes. This is useful when one wants to do centralized training for verification.

TODO

  1. Run clients training in parallel.

  2. Use the attribute _is_convergent to control the termination of the training. This perhaps can be achieved by comparing part of the items in self._cached_models

add_parameters(params: Iterable[Parameter], ratio: float) None[source]#

Update the server’s parameters with the given parameters.

Parameters:
Return type:

None

aggregate_client_metrics(ignore: Sequence[str] | None = None) None[source]#

Aggregate the metrics transmitted from the clients.

Parameters:

ignore (Sequence[str], optional) – The metrics to ignore.

Return type:

None

avg_parameters(size_aware: bool = False, inertia: float = 0.0) None[source]#

Update the server’s parameters via averaging the parameters received from the clients.

Parameters:
  • size_aware (bool, default False) – Whether to use the size-aware averaging, which is the weighted average of the parameters, where the weight is the number of training samples. From the view of optimization theory, this is recommended to be set False.

  • inertia (float, default 0.0) – The weight of the previous parameters, should be in the range [0, 1).

Return type:

None

abstract property client_cls: type#

Class of the client node.

abstract property config_cls: Dict[str, type]#

Class of the client node config and server node config.

Keys are “client” and “server”.

evaluate_centralized(dataloader: DataLoader) Dict[str, float][source]#

Evaluate the model on the given dataloader on the server node.

Parameters:

dataloader (DataLoader) – The dataloader for evaluation.

Returns:

metrics – The metrics of the model on the given dataloader.

Return type:

dict

extra_repr_keys() List[str][source]#

Extra keys for __repr__() and __str__().

get_cached_metrics(client_idx: int | None = None) List[Dict[str, float]][source]#

Get the cached metrics of the given client, or the cached aggregated metrics stored on the server.

Parameters:

client_idx (int, optional) – The index of the client. If None, returns the cached aggregated metrics stored on the server.

Returns:

The cached metrics of the given client, or the cached aggregated metrics stored on the server.

Return type:

List[Dict[str, float]]

get_client_data(client_idx: int) Tuple[Tensor, Tensor][source]#

Get all the data of the given client.

This method is a helper function for fast access to the data of the given client.

Parameters:

client_idx (int) – The index of the client.

Returns:

Input data and labels of the given client.

Return type:

Tuple[Tensor, Tensor]

get_client_model(client_idx: int) Module[source]#

Get the model of the given client.

This method is a helper function for fast access to the model of the given client.

Parameters:

client_idx (int) – The index of the client.

Returns:

The model of the given client.

Return type:

torch.nn.Module

property is_convergent: bool#

Whether the training process is convergent.

train(mode: str = 'federated', extra_configs: dict | None = None) None[source]#

The main training loop.

Parameters:
  • mode ({"federated", "centralized", "local"}, optional) – The mode of training, by default “federated”, case-insensitive.

  • extra_configs (dict, optional) – The extra configs for the training mode.

Return type:

None

train_centralized(extra_configs: dict | None = None) None[source]#

Centralized training, conducted only on the server node.

This is used as a baseline for comparison.

Parameters:

extra_configs (dict, optional) – The extra configs for centralized training.

Return type:

None

train_federated(extra_configs: dict | None = None) None[source]#

Federated (distributed) training, conducted on the clients and the server.

Parameters:

extra_configs (dict, optional) – The extra configs for federated training.

Return type:

None

TODO

Run clients training in parallel.

train_local(extra_configs: dict | None = None) None[source]#

Local training, conducted on the clients, without any communication with the server. Used for comparison.

Parameters:

extra_configs (dict, optional) – The extra configs for local training.

Return type:

None

update_gradients() None[source]#

Update the server’s gradients.