Day 3: Nearly 25 Million Points in a Single File – PDAL Yosemite

Day 3: Nearly 25 million points in a single file. Where do you even start?

FOSS4G Bucharest, 2019. Connor Manning and Adam Steer showed what PDAL/Entwine could do with point clouds, and I was hooked. This was the tool I needed.

Point cloud data arrives as millions of unorganized XYZ coordinates. Before slope analysis or terrain rendering, you have to classify ground points, filter noise, build meshes, and interpolate elevation. And the sequence matters.

Filter too early and you lose detail in complex terrain. Filter too late and noise corrupts the analysis.

The tutorial gave me the order. The scale taught me patience.

I documented my PDAL learning journey using Yosemite Valley data: 570 million points across all files, about 19 hours of processing, fully open-source workflows built with JSON pipelines. Every command and output shared for the next person navigating the same terrain.

The visualization loads in seconds. Processing took 19 hours.

Read more (2020 tutorial): LiDAR PDAL experiments – Yosemite Valley

The project

After Day 2’s slope analysis on a DTM, this day steps back to the raw data layer: millions of LAS points from OpenTopography, colorized with NAIP imagery, indexed for web delivery. The question is not only visualization. It is reproducible pipeline order at scale.

Yosemite Valley again, but now as point clouds: three LAS tiles, 570 million points total, one file alone near 25 million.

Pipeline order (why sequence matters)

StagePurposeRisk if rushed
Ground classificationSeparate terrain from vegetation and structuresWrong surface for slope or mesh
Noise filteringRemove outliers and acquisition artifactsFalse steep faces or holes
Mesh / raster stepsBuild surfaces for analysis or exportLost detail or corrupted stats
ColorizationBind aerial imagery to points for readable viewsCorrect geometry, unreadable output
Entwine + PotreeIndex for browser streamingFast viewer, slow or broken prep

PDAL expresses this as JSON pipelines: readable, versionable, shareable. That was the FOSS4G lesson: open tools plus documented commands lower the barrier for the next person.

Tools and outputs

  • PDAL: filtering, classification, colorization, writers
  • Entwine: spatial indexing for large clouds
  • Potree: browser visualization (loads in seconds after indexing)
  • QGIS / bash: supporting checks and batch steps

The 2020 blog series walks through commands and outputs. The video walkthrough shows the viewer behavior on real Yosemite geometry.

Scale vs speed

Indexing and streaming feel instant in the browser. Preparation does not. 19 hours of processing for the full Yosemite set is the honest counterweight to a Potree view that opens in seconds. That gap is normal in LiDAR work at this scale.

How to follow


#100DayMapChallenge · Day 3/100 · PDAL Yosemite · LinkedIn

Previous: Day 2: Mapping Walls You Can’t See From Above

Next: Day 4

Leave a Reply

Your email address will not be published. Required fields are marked *