Surprises in Proper Positive-Only Learning

Summary§

This paper resolves the long-open question of proper learning from positive-only samples in the PAC model. The authors show that a concept class is properly learnable from positive-only examples if and only if it has finite VC dimension and satisfies a new condition called uniform exterior separability (UES). This leads to surprising separations: proper vs. improper learning, randomized vs. deterministic proper learning, classes where no ERM succeeds, and finite VC dimension insufficient even for non-uniform learning.

Core Technical Contribution§

Model (Proper Positive-Only Learning)§

The learner receives i.i.d. samples from the positive region $\mathcal{X}^+$ of an unknown concept $c^ \in \mathcal{C}$ (where domain $\mathcal{X} = \mathcal{X}^+ \cup \mathcal{X}^-$). The learner must output a hypothesis $h \in \mathcal{C}$ (proper). Evaluation is under the original distribution $\mathcal{D}$ over $\mathcal{X}$, with generalization error $\Pr_{x \sim \mathcal{D}}[h(x) \neq c^(x)]$.

Key Result§

Theorem. A concept class $\mathcal{C}$ is properly learnable from positive-only samples iff: 1. $\text{VC}(\mathcal{C}) < \infty$, and 2. $\mathcal{C}$ is uniformly exterior separable.

Uniform exterior separability (UES) means: there exists a function $f: \mathbb{N} \to \mathbb{N}$ such that for every $n$, every finite subset $S \subseteq \mathcal{X}^+$ of size $n$ can be separated from its complement by some $h \in \mathcal{C}$ with margin $1/f(n)$ (in the sense of a distance-based separator). Formal definition involves a notion of "exterior" points.

Implications§

Proper learning is strictly harder than improper learning (which only requires finite VC dimension).
Randomized proper learners can outperform deterministic ones.
There exist classes with finite VC dimension that are not properly learnable even non-uniformly.
ERM can fail: some classes are properly learnable but no ERM (empirical risk minimization) succeeds.

New Combinatorial Dimensions§

The paper introduces the positive VC dimension and positive exterior dimension to quantify the hardness of proper positive-only learning.

Code Illustration§

Below is a conceptual PyTorch snippet illustrating the evaluation setup for a proper positive-only learner (not the UES condition itself):

import torch
import torch.nn as nn

# Hypothetical concept class: linear classifiers in R^2
class LinearClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.w = nn.Linear(2, 1, bias=True)
    
    def forward(self, x):
        return torch.sign(self.w(x)).squeeze()

# Positive-only training data (all labels +1)
X_pos = torch.randn(100, 2)  # all positive examples

# Learner: must output hypothesis from same class (proper)
model = LinearClassifier()
# Train with some proper learning algorithm (e.g., using only positive data)
# ... (not trivial; paper shows ERM may fail)

# Evaluation distribution D (both positive and negative)
X_test = torch.randn(200, 2)
y_test = torch.randint(0, 2, (200,))  # true labels
with torch.no_grad():
    preds = model(X_test)
    error = (preds != y_test).float().mean()

Key Equations§

Let $\mathcal{C} \subseteq \{0,1\}^{\mathcal{X}}$. For a distribution $\mathcal{D}$ over $\mathcal{X}$, the positive-only learning error of hypothesis $h$ is: $$ \text{err}(h) = \Pr_{x \sim \mathcal{D}}[h(x) \neq c^(x)]. $$ The learner only sees positive examples $x \sim \mathcal{D}^+$, where $\mathcal{D}^+$ is $\mathcal{D}$ conditioned on $c^(x)=1$.

The uniform exterior separability condition: there exists $m: \mathbb{N} \to \mathbb{N}$ such that for any $n$ and any set $S \subseteq \mathcal{X}^+$ of size $n$, there is $h \in \mathcal{C}$ with $h(S) = 1$ and $\forall x \notin S$, $h(x) = 0$ if $x$ is "exterior" to $S$ (i.e., outside a certain neighborhood).

Conclusion§

This paper provides a complete characterization of proper positive-only learning, revealing a rich structure beyond standard PAC learning. The new combinatorial condition UES may find applications in other learning models with restricted data access.

Abstract

Technical Analysis & Implementation

Summary§

Core Technical Contribution§

Model (Proper Positive-Only Learning)§

Key Result§

Implications§

New Combinatorial Dimensions§

Code Illustration§

Key Equations§

Conclusion§

Related Research

Second-Order KKT Guarantees for Bregman ADMM in Nonconvex and Non-Lipschitz Optimization

Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes

DexCompose: Reusing Dexterous Policies for Multi-Task Manipulation with a Single Hand