SplitOut: Out-of-the-Box Training-Hijacking Detection in Split Learning via Outlier Detection

1Technical University of Munich, Munich, Germany, 2Kadir Has University, Istanbul, Turkey, 3Bilkent University, Ankara, Turkey, 4Koç University, Istanbul, Turkey
*Indicates Equal Contribution

International Conference on Cryptology And Network Security 2024

Abstract

Split learning lets clients train deep neural networks with a central server without sharing raw data. A malicious server can hijack training and reconstruct private data or plant backdoors in client models. Earlier defenses detect such training-hijacking attacks but rely on many heuristics and many tuned hyperparameters.

SplitOut is a simple passive detector that runs on the client side. Clients briefly train the full model locally on a small fraction (e.g., 1%) of their private data and collect honest gradients. We train an off-the-shelf Local Outlier Factor (LOF) model on these gradients. During real split learning, LOF flags incoming server gradients as inliers (honest) or outliers (hijacked).

We evaluate SplitOut on MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and EMNIST. It detects all existing split learning training-hijacking attacks we test, with almost-perfect true positive rates and near-zero false positives, often within the first epoch. The method needs only modest additional client compute and no attack-specific tuning.

SplitOut

SplitOut is a client-side, passive detector for training-hijacking attacks in split learning.

  • Before split learning, the client simulates one epoch of end-to-end training on a small subset (e.g., 1%) of its private data.
  • It records the gradients of the simulated server part; these are the honest gradients used to train LOF.
  • During real split learning, each gradient sent by the server is fed into the trained LOF model.
  • A sliding window of the last w LOF decisions is kept; if most are outliers, the client flags an attack and stops training.

We use LOF with the maximum feasible number of neighbors, and we rely only on honest gradients collected once before deployment. No attacker behavior is modeled and no attack-specific features are needed.





On FSHA, SplitSpy, and backdoor attacks, SplitOut reaches true positive rate (TPR) ≈ 1 and very low false positive rate (FPR). On MNIST and Fashion-MNIST it detects all attacks with TPR = 1 and FPR = 0. On CIFAR-10 and CIFAR-100, FPR is small and drops to zero when we allocate more data to LOF training.

Detecting the Attack with SplitOut

SplitOut detects feature-space hijacking attacks very early in training. The detector fires long before the attacker can reconstruct readable private images or install a strong backdoor.


Reconstructions obtained when SplitOut is active have almost zero SSIM with the original inputs, compared to high SSIM when FSHA runs without detection.

Red lines represent where the attack (FSHA) is detected.

Avoiding the Curse of Dimensionality


Gradients live in a very high-dimensional space, but honest and malicious gradients form very different neighborhoods. t-SNE plots show honest gradients forming a tight, dense cluster. Malicious gradients spread over a larger region and largely surround the honest cluster.

This structure is exactly what local density-based methods such as LOF exploit. Honest points sit in dense neighborhoods; hijacked gradients have sparser and more mixed neighborhoods and receive higher LOF scores.


Pair-wise average L2 distances between sets of honest and malicious gradients on CIFAR-10 reveal the same picture:

  • Honest gradients are closest to other honest gradients.
  • Honest gradients are closer to each other than malicious gradients are to each other.
  • Malicious gradients are closer to honest gradients than to other malicious ones, so their local neighborhoods are low-density and mixed.

These geometric properties explain why a neighborhood-based outlier detector like LOF can separate honest and hijacked gradients despite the curse of dimensionality.

Additional Figures

F-MNIST t-SNE


EMNIST t-SNE


Similar t-SNE plots on F-MNIST and EMNIST show the same pattern: honest gradients cluster tightly, while malicious gradients cover a broader region around them.

Assumptions and Limitations

SplitOut assumes:

  • Clients know the full model architecture or a good approximation of it.
  • Clients can run one local epoch of end-to-end training on a small part of their data (e.g., 1–10%).
  • Attacks consistently change the distribution of server-side gradients over time.

We evaluate feature-space alignment attacks: FSHA, SplitSpy, and a backdoor attack, plus a multitask FSHA that mixes honest and adversarial losses. SplitOut also detects the adaptive SplitSpy attacker that is designed to bypass SplitGuard.

Limitations include possible new training-hijacking strategies that do not rely on feature-space alignment and attacks that start only after several epochs of honest training. Detecting such attacks may require more LOF training data or different detectors and is an open direction for future work.

BibTeX


@misc{erdogan2024splitoutoutoftheboxtraininghijackingdetection,
  title={SplitOut: Out-of-the-Box Training-Hijacking Detection in Split Learning via Outlier Detection},
  author={Ege Erdogan and Unat Teksen and Mehmet Salih Celiktenyildiz and Alptekin Kupcu and A. Ercument Cicek},
  year={2024},
  eprint={2302.08618},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2302.08618}
}