Skip to main content

File Upload Integration

ACP’s Product Feed file upload uses a push-based SFTP model. Merchants proactively push product data files to an SFTP server designated by OpenAI, rather than having OpenAI pull from the merchant.

3.1 SFTP Push Model

Merchant Data System
  | SFTP upload (merchant-initiated push)
OpenAI SFTP Server
  | automatic parsing and indexing
ChatGPT Product Discovery Engine
Key point: This is a one-way push. Merchants control the upload timing and frequency. OpenAI does not actively pull data from merchant systems. SFTP credentials are provided by OpenAI after the merchant receives partner approval.

3.2 Supported File Formats

FormatCompressionRecommendationDescription
ParquetzstdRecommendedColumnar storage, highest compression efficiency
jsonl.gzgzipOptionalJSON Lines format, one record per line
csv.gzgzipOptionalComma-separated, requires UTF-8 encoding
tsv.gzgzipOptionalTab-separated, requires UTF-8 encoding
Encoding requirement: All text files must use UTF-8 encoding. Parquet with zstd compression is the preferred option because:
  • Columnar storage natively supports efficient field-level reads
  • zstd achieves higher compression ratios than gzip with faster decompression
  • Built-in schema information reduces type ambiguity

3.3 Snapshot Type: Full Catalog Override

ACP file uploads use a full catalog snapshot model, not an incremental (delta) model. Each uploaded file represents the complete source of truth for the product catalog. This means:
  • The uploaded file contains complete information for all active products
  • Each upload fully replaces the previous data
  • There is no need to mark operations as “add”, “modify”, or “delete”
  • If a product is absent from the latest snapshot, it is treated as delisted
Day 1 snapshot: [Product A, Product B, Product C]  -> catalog = A, B, C
Day 2 snapshot: [Product A, Product C, Product D]  -> catalog = A, C, D (B removed, D added)

3.4 Sharding Strategy

Large product catalogs need to be sharded for upload. Sharding guidelines:
ParameterRecommended Value
Max products per shard500,000 items
Target file sizeUnder 500 MB
Sharding example:
# Sharding plan for a 1 million product catalog
products_shard_001.parquet  -> Products 1 - 500,000
products_shard_002.parquet  -> Products 500,001 - 1,000,000
When sharding, ensure each product (including all its Variants) is contained entirely within a single shard file. Do not split different Variants of the same Product across different files.

3.5 Upload Frequency

StrategyFrequencyPurpose
SFTP full snapshotAt least once dailyProduct catalog baseline sync
REST API incrementalReal-time throughout the dayPrice, inventory, promotion changes
Recommended approach: Upload a complete SFTP full snapshot once daily (early morning), and push real-time changes via REST API during the day (price adjustments, inventory updates, new product launches). This dual-channel strategy ensures:
  • Full snapshots provide a data consistency baseline
  • API incremental updates guarantee data freshness
  • Even if the API encounters brief issues, the full snapshot corrects data the next day

3.6 File Naming Conventions

Use stable, consistent file names. Uploading a file with the same name overwrites the previous content.
# Correct: fixed file names, overwrite each time
products_shard_001.parquet
products_shard_002.parquet

# Incorrect: do not append timestamps to file names
products_20260411_001.parquet
products_20260412_001.parquet
Do not append. ACP expects stable file names with content overwriting. Using timestamped file names means old files will not be automatically cleaned up, potentially causing data inconsistency.

3.7 Product Delisting

In the full snapshot model, there are two ways to delist a product: Method 1: Omit from the snapshot The simplest approach. If the product is not included in the next SFTP full snapshot, it naturally disappears from the catalog. Method 2: Set is_eligible_search to false Keep the product record in the snapshot but set the is_eligible_search field to false. The product data still exists but will not appear in ChatGPT’s product discovery.
{
  "id": "prod_discontinued_001",
  "is_eligible_search": false,
  "variants": [
    {
      "id": "var_001",
      "title": "Discontinued Product",
      "price": { "amount": 0, "currency": "USD" }
    }
  ]
}
Method 2 is suitable for scenarios where you need to retain the product record but temporarily hide it (e.g., seasonal products, temporarily out of stock).

3.8 Feed Header

Every uploaded file must include Feed Header information identifying the data source and target:
FieldTypeDescription
feed_idstringUnique identifier for the data feed
account_idstringMerchant account ID
target_merchantstringTarget merchant identifier
target_countrystringTarget country (ISO 3166-1 alpha-2)
{
  "feed_id": "feed_electronics_us",
  "account_id": "acct_merchant_123",
  "target_merchant": "merchant_xyz",
  "target_country": "US"
}

3.9 Best Practices Checklist

PracticeDescription
Use Parquet + zstdHighest compression efficiency, fastest parsing
Maintain UTF-8 encodingAvoid character set issues
Daily full + API incrementalDual channel ensures data freshness
Fixed file names with overwriteDo not append timestamped files
Max 500,000 items per shardMaintain processing efficiency
Max 500 MB per fileAvoid upload timeouts
Complete product in one shardProduct and Variants must not span files
Monitor upload statusConfirm SFTP transfers complete successfully

Next chapter: Chapter 4: REST API Integration — Product Feed REST API complete reference