What is a precise definition of shallow architecture in machine learning?
It means an artificial neural networks with few hidden layers. To be contrasted with deep neural networks. The precise boundary is debatable.
Shallow and deep refer to two different (but related) ways to go about modelling a problem. Shallow architectures rely on the paper by Cybenko: Approximation by Superposition of a Sigmoidal Function, where the shows that arbitrary decision regions can be arbitrarily well approximated by MLPs.
Basically it means that you would just add more neurons to your layer, thus making your network wider, as the complexity of your problem at hand increases (in very rough terms). My point is thus, that there is no precise definition to that term. Is more about how you approach the complexity of a problem.
I believe usually "shallow" means only one hidden layer. For example:
1. MLP with one hidden layer: data --> hidden --> softmax (class label) 2. SVM: data --> feature (can be considered as hidden) --> class label
Anything with more than 2 hidden layers (inclusive) can be called deep.
according to one book, shallow-structured architectures, typically contain at most one or two layers of non-linear feature transformations. Examples of shallow architectures are Gaussian mixture models (GMMs), linear or non-linear dynamical systems, conditional random fields (CRFs), maximum entropy (MaxEnt) models, support vector machines (SVMs), logistic regression, kernel regression, multilayer perceptrons (MLPs) with a single hidden layer including extreme learning machines (ELMs). For instance, SVMs use a shallow linear pattern separation model with one or zero feature transformation layer when the kernel trick is used or otherwise.