本质区别是 Semaphore 是基于系统提供的同步原语实现的同步等待，而 SemaphoreSlim 是通过自旋（基于 SpinWait） 实现的同步等待。
Slim 版本在大多数情况下不会出现用户态到系统态的转换，而 Semaphore 则非常可能发生这种情况。但由于 SpinWait 的特点，Slim 版本更加适用于等待时间较短的场景。因为如果出现了长时间的等待（长过一次工作状态的切换），SpinWait 会放弃继续空循环的操作，将执行机会让给其他的线程，这样又会导致上下文的切换。
The System.Threading.Semaphore class is a wrapper around the Win32 semaphore object (counting semaphores). This is a system wide semaphore, so it can be used between multiple processes.
On the other hand, the System.Threading.SemaphoreSlim is a lightweight, fast semaphore that is provided by the CLR and used for waiting within a single process when wait times are expected to be very short.
SemaphoreSlim and Semaphore are functionally similar. SemaphoreSlim is about 4 times faster than a Semaphore but SemaphoreSlim cannot be used for interprocess signalling.
The reason for the performance increase is because the SemaphoreSlim class provides a lightweight alternative to the Semaphore class that doesn’t use Windows kernel semaphores. In essence, if you do not need a named Semaphore, use the SemaphoreSlim class.
SemaphoreSlim is based on SpinWait and Monitor, so the thread that waits to acquire the lock is burning CPU cycles for some time in hope to acquire the lock prior to yielding to another thread. If that does not happen, then the threads lets the systems to switch context and tries again (by burning some CPU cycles) once the OS schedules that thread again. With long waits this pattern can burn through a substantial amount of CPU cycles. So the best case scenario for such implementation is when most of the time there is no wait time and you can almost instantly acquire the lock.
Semaphore relies on the implementation in OS kernel, so every time when you acquire the lock, you spend quite a lot of CPU cycles, but after that the thread simply sleeps for as long as necessary to get the lock.