Neural Network Size Calculator
How many parameters does my neural network have?
Find out if your neural network will fit in memory and how complex it is. Enter number of layers, neurons per layer, and input dimensions — see total parameters, memory requirements in MB, and training complexity. Assumes dense fully-connected layers.
—
Send feedback
💡 Share your idea or report a problem
✓ Thanks! We'll take a look.
Learn more
How It Works
The formula, explained simply
Parameter count explodes faster than you expect. A network with 1000 inputs and 1000 hidden neurons needs 1 million weights just for the first layer — before adding biases or additional layers. Each connection between layers multiplies the previous layer size by the current layer size, creating exponential growth that can quickly overwhelm available memory.
The calculator counts weights (connections between neurons) and biases (one per neuron) separately. Between any two layers, the number of weights equals the product of their sizes. A layer with 784 inputs connecting to 128 hidden neurons creates 784 × 128 = 100,352 weights, plus 128 biases, totaling 100,480 parameters for just that connection.
Memory requirements depend on your chosen precision. Each parameter stores as a floating-point number — float32 uses 4 bytes, float16 uses 2 bytes, float64 uses 8 bytes. Modern deep learning frameworks default to float32 for stability, but mixed precision training with float16 can halve memory usage without significant accuracy loss. The total memory footprint equals parameter count times bytes per parameter.
When To Use This
Right tool, right situation
Use this calculator when planning neural network architectures before implementation. It helps estimate whether your model will fit in available memory and guides layer size decisions. Essential for comparing different architectures — sometimes a deeper narrow network outperforms a shallow wide one with fewer parameters.
Calculate parameters early in research projects to set realistic expectations for training time and hardware requirements. Large parameter counts signal longer training, more data requirements, and higher overfitting risk. The parameter count helps justify architecture choices in papers and proposals.
Crucial for deployment planning where memory constraints matter. Mobile applications typically require models under 50MB, while server deployments can handle gigabyte-scale models. Knowing parameter count upfront prevents late-stage architecture changes when deployment constraints surface.
Common Mistakes
Why results sometimes look wrong
The biggest mistake is ignoring the quadratic relationship between layer size and parameters. Beginners often think doubling neurons doubles parameters, but it actually quadruples connections to that layer. A jump from 128 to 256 hidden neurons increases the layer's parameter count by roughly 4x, not 2x.
Many underestimate memory requirements during training. The model size shown here represents storage only — training needs 4-6x more memory for gradients, optimizer states, and activation caching. A 100MB model can easily require 500MB GPU memory during training, especially with large batch sizes.
Another common error is choosing precision based on memory alone. While float16 halves memory usage, it can cause training instability in some architectures. Gradient underflow becomes problematic when gradients become too small for 16-bit representation. Mixed precision training addresses this by keeping master weights in float32 while computing forward and backward passes in float16.
The Math
Worked examples and deeper derivation
The parameter calculation follows a layer-by-layer pattern. For each connection from layer i to layer j, parameters = (size_i × size_j) + size_j. The multiplication term represents weights, the addition term represents biases. A network with input size 784, hidden layers of 512 and 256 neurons, and output size 10 calculates as: (784 × 512) + 512 + (512 × 256) + 256 + (256 × 10) + 10 = 401,408 + 512 + 131,072 + 256 + 2,560 + 10 = 535,818 parameters.
Memory calculation multiplies total parameters by bytes per data type. For 535,818 parameters in float32 format: 535,818 × 4 bytes = 2,143,272 bytes = 2.04 MB. This represents model storage only — training requires additional memory for gradients (equal to model size), optimizer states (1-2x model size for Adam), and forward pass activations (depends on batch size and network depth).
The computational complexity grows as O(n²) with layer width, making wide networks expensive quickly. Doubling hidden layer size quadruples the parameters in connections to that layer. This explains why modern architectures like ResNet use skip connections and bottlenecks — they maintain expressiveness while controlling parameter growth through architectural constraints rather than brute force width increases.
Expert Unlock
The thing most explanations skip
Modern transformers break the standard dense layer assumptions this calculator uses. Self-attention mechanisms scale as O(n²) with sequence length, not hidden size. A transformer with 512 hidden dimensions processing 1024 tokens has attention matrices of size 1024×1024, creating memory bottlenecks unrelated to parameter count. Practitioners use gradient checkpointing and attention sparsity patterns to manage this.
How do I know if my neural network is too big for my hardware?
Need something this doesn't cover?
Suggest a tool — we'll build it →