In this paper, a non-line-of-sight multiple-input multiple-output space and time division multiple access optical camera communications system is proposed for an indoor environment. Mask matching and equal-gain combining (EGC) schemes as well as differential modulation and frame subtraction are used. We propose a unique packet structure to label the transmitters and a new detection method for data extraction from the captured video streams. We outline a comprehensive theoretical model and have developed an experimental testbed to evaluate the performance of the proposed system. The results highlight that zooming and defocusing of the camera does not have a significant impact on the system performance, therefore the aperture can be set to its maximum value. The system performs well over a link span of 10 m with a low transmit power of 12 mW and in the presence of ambient light due to the non-linear conversion of RAW to JPEG. Using mask matching and EGC improves the tolerance of the system to the noise.