Classification Pipeline and Settings
The classification pipeline is a three-step process that determines which category best fits your product. Understanding how this pipeline works and how settings control its behavior helps you optimize classification accuracy for your specific needs.
How the Pipeline Works
Every classification request—whether from the Playground, API, or MCP server—follows the same three-step process.
Generate Candidates
The classification engine analyzes your product description and returns 15 candidate categories with relevance scores. If “Return only leaves” is enabled, only leaf categories are included in these candidates—parent categories are filtered out at this stage. These candidates represent the categories most likely to match your product based on semantic similarity, keyword matching, and contextual understanding.
Each candidate receives a relevance score between 0 and 1, with higher scores indicating stronger similarity to your product description.
Example:
For the description “Apple iPhone 16 Pro 512GB smartphone with titanium design,” the system might generate candidates like:
Mobile Phones(0.95)Smartphones(0.93)Contract Mobile Phones(0.72)Unlocked Mobile Phones(0.71)Electronics Accessories(0.45)
The top candidates cluster around mobile phones, but the engine can’t yet determine if the phone is contract, unlocked, or pre-paid based on the description alone.
AI Selection
The AI evaluates the 15 candidates and attempts to determine the single best category. This isn’t simply choosing the highest-scored candidate—the AI considers:
- Semantic context : What the description actually describes
- Taxonomy hierarchy : How categories relate to each other
- Ambiguity detection : Whether the description provides enough information to distinguish between sibling categories
- Custom instructions : Any guidance you’ve provided for edge cases
If the AI confidently identifies a category, that’s your result with a green status indicator.
If the AI cannot confidently choose between categories—for example, when siblings have similar scores and the description doesn’t provide distinguishing information—it proceeds to step 3.
Fallback Behavior
What happens when the AI cannot confidently select a category depends on your settings:
If “Use top-ranked category” is enabled:
The system returns the candidate with the highest relevance score, regardless of AI confidence. This produces a yellow status indicator.
If “Use top-ranked category” is disabled:
The system returns no category found, indicated by a gray status indicator.
Settings Reference
The Playground and API offer four settings that control pipeline behavior. Each setting has a specific purpose and interacts with the others in important ways.
Taxonomy
What it does: Selects which product taxonomy to use for classification.
Options:
- Shopify Standard Product Taxonomy (~10,000 categories)
- Google Product Taxonomy (~6,000 categories)
When to use which:
Choose the taxonomy that matches your target platform or sales channel. If you’re selling on Shopify, use the Shopify taxonomy. If you’re advertising on Google Shopping, use the Google taxonomy.
For businesses selling across multiple channels, you can classify the same product into both taxonomies using separate requests.
API equivalent: taxonomy parameter with values "Shopify" or "Google"
Custom Instructions for AI
What it does: Provides natural language guidance that influences how the AI selects categories during step 2.
Format: Free-form text written as if coaching a colleague
When to use: Handle recurring edge cases, apply business logic, or provide disambiguation rules for products that legitimately fit multiple categories.
Examples:
- “For phones, default to unlocked unless the description explicitly states contract or pre-paid”
- “When products could be clothing or accessories, prefer clothing if they’re worn on the body”
- “Gaming peripherals should go under gaming, not general computer accessories”
How they work:
Custom instructions are evaluated during AI selection (step 2). The AI considers your instructions alongside the candidates and product description when determining the best match. Well-written instructions can resolve ambiguity that would otherwise require fallback behavior.
Best practices:
- Be specific about which situations the instruction applies to
- Use clear, everyday language
- Focus on disambiguation rather than general guidance
- Test instructions with real products to validate they work as intended
For comprehensive guidance on writing effective instructions, see Writing Custom Instructions.
API equivalent: customInstructionsForAi parameter with string value
Return Only Leaves of the Category Hierarchy
What it does: Filters candidate generation (Step 1) to only include leaf categories—categories with no children.
Options: Enabled or disabled
When enabled:
Only leaf categories are generated as candidates in Step 1. The AI never sees parent categories as options—they’re filtered out before the AI selection process begins.
This means:
- The 15 candidates returned in Step 1 are all leaf nodes
- The AI can only choose between specific, detailed categories
- Parent category fallback is impossible because parent categories aren’t in the candidate set
- If the AI can’t decide, the only options are choosing a top-ranked leaf (if enabled) or returning no category
When disabled (default):
Both leaf and parent categories can appear in the candidate set. When the AI encounters ambiguity between sibling leaves, their parent category may be available as a fallback option if it scored highly enough to be among the 15 candidates. This prevents incorrect specific classifications when ambiguity exists.
Example:
Consider these categories from the Google taxonomy:
267 - Electronics > Communications > Telephony > Mobile Phones
543513 - Electronics > Communications > Telephony > Mobile Phones > Contract Mobile Phones
543512 - Electronics > Communications > Telephony > Mobile Phones > Pre-paid Mobile Phones
543514 - Electronics > Communications > Telephony > Mobile Phones > Unlocked Mobile PhonesFor “Apple iPhone 16 Pro” without contract information:
- Leaf-only disabled: Candidate generation includes both category
267and the three leaf siblings. The AI can choose267(Mobile Phones) as a safe fallback when it cannot distinguish between the leaves. - Leaf-only enabled: Only the three leaf siblings (543513, 543512, 543514) are generated as candidates—category
267never appears. The system must either choose543514(Unlocked) if it’s top-ranked, or return no category if top-ranked fallback is disabled.
When to use:
Enable leaf-only when your platform or business rules require all products to be classified at the most specific level. This is common for certain sales channels or product feeds that don’t accept parent categories.
Trade-offs:
Leaf-only mode eliminates the AI’s ability to fall back to safe, accurate parent categories. When product descriptions lack distinguishing information, the system must either guess at a specific leaf category or return nothing. You may need robust custom instructions and detailed product descriptions to handle ambiguous cases.
API equivalent: leafsOnly parameter with boolean value
Use Top-Ranked Category if AI Cannot Decide
What it does: Returns the highest-scored candidate when the AI cannot confidently select a category during step 2.
Options: Enabled or disabled
When enabled:
If the AI proceeds to step 3, the system returns the candidate with the highest relevance score. This ensures you always get a result (assuming candidates were generated), though with a yellow status indicator showing lower confidence.
When disabled:
If the AI proceeds to step 3, the system may return a parent category (if leaf-only is disabled) or return no category at all. This prioritizes accuracy over coverage.
When to use:
Enable top-ranked fallback when getting some result is more valuable than ensuring perfect accuracy. This is useful for:
- Initial catalog imports where manual review will follow
- Situations where any reasonable category helps downstream processes
- High-volume classifications where human review can catch errors
Trade-offs:
Top-ranked categories (yellow indicators) are less reliable than AI-selected categories (green indicators). The highest-scored candidate isn’t necessarily correct—relevance scoring focuses on similarity, not semantic correctness.
API equivalent: fallbackToBestGuess parameter with boolean value
How Settings Interact
Settings don’t operate independently—they create different pipeline behaviors when combined.
Leaf-Only + Top-Ranked Enabled
Behavior: Always returns a leaf category or nothing
Only leaves are generated as candidates. The AI evaluates these leaf options and either makes a confident selection (green) or the system falls back to the top-ranked leaf (yellow). No parent category escape hatch exists.
Use case: Platforms requiring leaf categories with maximum coverage
Leaf-Only + Top-Ranked Disabled
Behavior: Returns a confidently-selected leaf or nothing
Only leaves are generated as candidates. The AI must make a confident selection from these leaves or the system returns no category. This is the most restrictive configuration—no score-based fallback and no parent category safety net.
Use case: Quality-over-quantity approaches where incorrect specificity is worse than no result
Leaf-Only Disabled + Top-Ranked Enabled
Behavior: Always returns some category (leaf or parent)
Both leaves and parents appear in candidates. The AI makes its best selection from all available options. If uncertain, it may choose a parent category. If completely unable to decide, the system returns the top-ranked candidate regardless of specificity.
Use case: Maximum coverage with reasonable accuracy—the most common configuration
Leaf-Only Disabled + Top-Ranked Disabled (default)
Behavior: Returns AI-selected category or parent fallback only
Both leaves and parents appear in candidates. The AI chooses the most appropriate option from this full set, which may be a parent category when sibling leaves are ambiguous. No score-based fallback—only returns categories the AI actively selects.
Use case: Balanced accuracy prioritizing semantic correctness over coverage
Configuration Examples
Here are common configuration patterns for different classification needs:
Maximum Coverage Configuration
Taxonomy: [your choice]
Custom instructions: [optional]
Leaf only: OFF
Use top-ranked: ONThis configuration ensures you get results for nearly every product, accepting both confident selections (green) and fallbacks (yellow). Use this for initial catalog imports or when downstream processes can handle some inaccuracy.
Precision-First Configuration
Taxonomy: [your choice]
Custom instructions: [required for edge cases]
Leaf only: ON
Use top-ranked: OFFThis configuration prioritizes accuracy over coverage but eliminates the parent category safety net. You’ll get fewer results since the AI can only choose leaves or nothing—no fallback to more general categories. Products that return nothing need better descriptions or custom instructions to resolve ambiguity at the leaf level.
Specific Categories Required
Taxonomy: [your choice]
Custom instructions: [recommended]
Leaf only: ON
Use top-ranked: ONThis configuration guarantees leaf-level classifications. Only leaf categories are generated as candidates, and the top-ranked fallback ensures you get a result. Use this when your platform requires specific categories and you’re willing to accept some imprecision (from score-based selection) for complete coverage.
Balanced Default
Taxonomy: [your choice]
Custom instructions: [optional]
Leaf only: OFF
Use top-ranked: OFFThis is the default configuration. It returns AI-selected categories when confident and parent categories when uncertain about siblings. This balances accuracy and coverage for most use cases.
Testing Your Configuration
After selecting a configuration, validate it with real products from your catalog:
- Test diverse products : Include straightforward and ambiguous items
- Check status indicators : Note the ratio of green/yellow/black results
- Verify category accuracy : Ensure returned categories match expectations
- Iterate on settings : Adjust based on results and business needs
For comprehensive testing strategies, see the “Testing Best Practices” section in Using the Playground.
API Integration
Once you’ve validated your configuration in the Playground, replicate it in your API requests:
{
"product": "Apple iPhone 16 Pro 512GB smartphone",
"taxonomy": "google",
"customInstructionsForAi": "Default to unlocked for phones",
"leafsOnly": false,
"fallbackToBestGuess": true
}The API behaves identically to the Playground with the same settings, ensuring consistent results in production.
For complete API documentation, see API Reference.
Choosing the Right Strategy
Different business needs require different configuration approaches. For detailed guidance on recommended configurations for specific use cases, see Classification Strategies.