FedShakespeare#

class fl_sim.data_processing.FedShakespeare(datadir: Path | str | None = None, seed: int = 0, **extra_config: Any)[source]#

Bases: FedNLPDataset

Federated Shakespeare dataset.

Shakespeare dataset is built from the collective works of William Shakespeare. This dataset is used to perform tasks of next character prediction. FedML [1] loaded data from TensorFlow Federated (TFF) shakespeare load_data API [2] and saved the unzipped data into hdf5 files.

Data partition is the same as TFF, with the following statistics.

DATASET

TRAIN CLIENTS

TRAIN EXAMPLES

TEST CLIENTS

TEST EXAMPLES

SHAKESPEARE

715

16,068

715

2356

Each client corresponds to a speaking role with at least two lines.

Parameters:
  • datadir (Union[str, pathlib.Path], optional) – The directory to store the dataset. If None, use default directory.

  • seed (int, default 0) – The random seed.

  • **extra_config (dict, optional) – Extra configurations.

References

property candidate_models: Dict[str, Module]#

A set of candidate models.

char_to_id(char: str) int[source]#

Convert a character to an integer index.

property doi: List[str]#

DOI(s) related to the dataset.

evaluate(probs: Tensor, truths: Tensor) Dict[str, float][source]#

Evaluation using predictions and ground truth.

Parameters:
Returns:

Evaluation results.

Return type:

Dict[str, float]

get_dataloader(train_bs: int | None = None, test_bs: int | None = None, client_idx: int | None = None) Tuple[DataLoader, DataLoader][source]#

Get local dataloader at client client_idx or get the global dataloader.

Parameters:
  • train_bs (int, optional) – Batch size for training dataloader. If None, use default batch size.

  • test_bs (int, optional) – Batch size for testing dataloader. If None, use default batch size.

  • client_idx (int, optional) – Index of the client to get dataloader. If None, get the dataloader containing all data. Usually used for centralized training.

Returns:

get_word_dict() Dict[str, int][source]#

Get the word dictionary.

id_to_word(idx: int) str[source]#

Convert an integer index to a character.

preprocess(sentences: Sequence[str], max_seq_len: int | None = None) List[List[int]][source]#

Preprocess a list of sentences.

Parameters:
  • sentences (Sequence[str]) – List of sentences to be preprocessed.

  • max_seq_len (int, optional) – Maximum sequence length. If None, use default sequence length.

Returns:

List of tokenized sentences.

Return type:

List[List[int]]

property url: str#

URL for downloading the dataset.

view_sample(client_idx: int, sample_idx: int | None = None) None[source]#

View a sample from the dataset.

Parameters:
  • client_idx (int) – Index of the client on which the sample is located.

  • sample_idx (int) – Index of the sample in the client.

Return type:

None

property words: List[str]#

Get the word list.