Tokenization

11 items

ARTICLE↑ trendingHacker News (AI)·18h ago

Ask HN: What works for cutting AI token costs?

The user is experiencing high LLM token costs and is asking for practical, real-world strategies to reduce these expenses beyond switching to cheaper models. They are seeking advice from others who have successfully implemented cost-saving measures in their AI applications.

Cost Optimization AI Tokenization Real Applications

RESEARCH↑ trendingReddit r/MachineLearning·19d ago

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

This discussion questions whether production Vision-Language Models (VLMs) still rely on fixed-patch Vision Transformers (ViTs) for their vision capabilities, despite the existence of more efficient tokenization methods. It explores potential reasons for this, such as marginal gains, pipeline limitations, or unclear scaling laws for adaptive patching.

VLMs deep learning Vision Transformers Tokenization

ARTICLEDEV.to AI·4/18/2026

Tokenizer de Claude 4.7: 1.47x más tokens medidos vs Claude 4.6

Claude 4.7's tokenizer consumes 1.47x more tokens in empirical measurements than 4.6, exceeding Anthropic's official estimates. This makes prompts more expensive in practice, even though the per-token price remains the same, raising questions about the value exchange.

Tokenization Cost analysis LLM

RESEARCHarXiv CS.LG·11d ago

Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models

This paper introduces COM (Continuity and Ordinality Matter), a strategy that integrates geometric constraints into both the initialization and training stages of token-based time series large language models (TS-LLMs). The research demonstrates that preserving continuity and ordinality in time series token embeddings significantly improves model performance and generalizability.

machine learning Tokenization large language models Time Series Analysis

RESEARCHarXiv CS.AI·13d ago

BrickAnything: Geometry-Conditioned Buildable Brick Generation with Structure-Aware Tokenization

This work introduces BrickAnything, a geometry-conditioned autoregressive framework for generating physically buildable brick structures from diverse 3D shapes. It uses point clouds as a unified geometric interface and predicts brick sequences that reconstruct the target shape under assembly constraints, introducing structure-aware tree tokenization.

brick generation 3D reconstruction geometry-conditioned AI

DOCDEV.to AI·14d ago

How LLMs Actually Work — From Tokens to Text (with Python)

This content explains the fundamental mechanism of Large Language Models (LLMs) like ChatGPT, detailing how they predict the next token to generate text. It describes the pipeline from tokenization and vector representation to attention mechanisms and the iterative process of text generation.

learning text generation Python Tokenization

ARTICLEDEV.to AI·4/18/2026

Opus 4.7 Uses 35% More Tokens Than 4.6. Here's What I'm Doing About It.

Claude Opus 4.7's new tokenizer is causing an effective 35% price increase for the same work due to higher token consumption compared to version 4.6. While reasoning improvements are real for complex tasks, the author plans to use 4.7 selectively and stick with 4.6 for tasks where token efficiency is key.

AI cost Claude Tokenization LLM

ARTICLEDEV.to AI·4/21/2026

Opus 4.7's Tokenizer Change: How to Measure Your Real Claude Code Costs

Claude Opus 4.7's updated tokenizer can increase costs by 40% or more for the same input, especially for system prompts and high-resolution images, due to higher token counts. It is crucial to use a token counter to measure real costs before upgrading.

AI models Anthropic Cost Optimization Tokenization

DOCfast.ai Blog·10/15/2025

Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs

This content transforms Andrej Karpathy's video on GPT tokenization into a detailed book chapter. It includes inlined code and images, serving as a comprehensive guide to understanding a key piece of how LLMs work.

LLMs GPT learning NLP

Let’s Build the GPT Tokenizer: A Complete Guide to Tokenization in LLMs

ARTICLEDEV.to AI·4/20/2026

How do large organizations benefit from tokenizing physical assets?

Large organizations benefit from tokenizing physical assets by transforming illiquid holdings into digital, tradable units on a blockchain. This process significantly improves liquidity, transparency, and operational efficiency in asset management.

Blockchain Finance digital assets asset management

ARTICLEDEV.to AI·4/10/2026

U.S. Blockchain Development Accelerates With Asset Tokenization and Layer 2 Growth

O desenvolvimento de blockchain nos EUA evoluiu de experimentação para infraestrutura real em 2026, impulsionado pela tokenização de ativos e soluções Layer 2. Isso tornou a tecnologia mais prática, escalável e econômica para empresas em setores como finanças e logística.

Blockchain enterprise blockchain Tokenization Layer 2