Angle-of-Arrival for Massive MIMO

Project Dataset

This dataset was collected by Joshua Miraglia from the University of Utah, in the anechoic chamber setup at the POWDER platform. The original purpose of this dataset was to collect know signals in a massive MIMO system, to train a deep learning model for signal classification. The experiment consisted of having two UEs/clients, transmit a known signal to a 48-antenna base station. We varied both the parameters of the signal, as well as the client location in an 80-point grid (from 8 different azimuth and 10 elevation angles). Before you use the datasets, please read the Data Copyright and License Agreement below.

Dataset Description

Signals are sent from two client Iris SDRs and collected at all the 48 radios in the mMIMO base station. One of the UEs is considered the reference device and aligned with the boarsight of the antenna array. The other UE is used as a roaming client positioned along an 8x10 grid representing 8 azimuth positions and 10 elevation positions (angles). We attempted to create 12 degree spacing in all of the ground points. The image below shows the experimental setup inside the anechoic chamber.

The actual coordinates for UE device and base station radio are recorded in the HDF5 provided below. These coordinates can be found under the attributes REFERENCE_COORDINATES, CLIENT_COORDINATES, and MIMO_COORDINATES. The center of the base station is considered the origin of the coordinate system. At each location point, we transmit the same pre-generated 10000 data frames, sweeping over different parameters:

  • Modulation Order: QPSK, QAM16, QAM64
  • Signal Bandwidth: Fractional between [0.15, 0.85]
  • Center Frequency: Randomly chosen within a particular range
  • TX Gain (SNR proxy): Three randomly generated values in the range [45, 81]dB, at least 12 dB apart from one another. Notice, noise samples are also collected for each frame, to help in the SNR calculations. Each different hdf5 file within a runMxN folder corresponds to a different gain value.
  • Sample Start and End: The sample start and end times are generated such that the signal is at least 248 samples-long.

At every transmission, the uplink samples use different prefix and postfix padding sizes. An example of a single collected frame can be seen in the figure below. The top row is the pre-generated signal. The second row shows the full frame sent between the client radio and the base station, including uplink transmissions from the reference and roaming client, as well as a noise reference at the end of the frame. The last two rows show the captured signals and their power spectral desnities (PSDs), measured at the base station, for both the reference and roaming clients.


The main repository can be found here. The following are some of the more relevant scripts to be used with the dataset we provide:

  • Data Collection Script: This script is run once for each datapoint (location).
  • Sounder Configuration Template: JSON configuration file used to run the Sounder tool for data collection. This configuration isn't entirly accurate as the slot parameters are created to make each slot 2048 samples long. However, we do not use OFDM signals in this project so the parameters 'ofdm_symbol_per_slot', 'cp_size', 'prefix', and 'postfix' are irrelevant.
  • Modified Sounder Application: Repo containing the modified Sounder code used to collect this dataset.

Each folder of the dataset is named runMxN, where M denotes the ground space grindpoint (azimuth angle), and N marks the vertical grid point where the UE was located. Within each folder, the HDF5 files containing the signals sent from that point are included. The coordinates of each radio and gain settings used are included in the HDF5 files as attributes, but in general these follow the layout in the first figure above. In addition, we are currently developing some tools for integrating the dataset into Keras, these can be found here.

Research Enabled

A team from Florida Atlantic University has recently published a paper in ICASSP 2023 using this dataset. The paper is titled "Single-Sample Direction-of-Arrival Estimation for Fast and Robust 3D Localization with Real Measurements from a Massive MIMO System," by S. Mazokha, S. Naderi, G. I. Orfanidis, G. Sklivanitis, D. A. Pados, and J. O. Hallstrom. In addition, the authors have released their code on GitHub. The authors have done a great job documenting how they used and processed the dataset so we strongly suggest users take a look at their work.

Data Copyright and License

Rice University hereby grants you a non-exclusive, non-transferable license to use the data for commercial, educational, and/or research purposes only. You agree to not redistribute the data without written permission from Rice University.

You agree to acknowledge the source of the data in any publication or product reporting on your use of it.

We provide no warranty whatsoever on any aspect of the data, including but not limited to its correctness, completeness, and fitness. Use at your own risk.

You agree to acknowledge Joshua Miraglia, "Signal Discovery with Convolutional Neural Nets", The University of Utah, 2022. in any publication or product reporting on your use of the data. If the data is not part of the IEEE Transactions on Wireless Communications reference data, you also agree to acknowledge the additional source of the data, if applicable.

NOTE: Downloading, obtaining, and/or using the data in any means constitutes your agreement with these terms.


# File Name Description Link Size
1 RunMxN Folder containing all datasets (all azimuth/elevation datapoints) 4.3 TB