The Gap Inside the Gap
Since the beginning of give GroundUp, I've been talking about a commitment to ethical AI. It's in how I describe the platform. It's in our grant applications. It's in the first blog post I published, the one about the discovery gap and why I started building this in the first place.
What I've been wrestling with lately is the distance between saying that and what it means to actually do it. More specifically, what it looks like when you're deep in the build and you start finding the places where your commitment and your code aren't aligned yet.
That’s why I’m sharing this post. I want to make sure I’m writing honestly about the work, and what we find as we do it, including the parts we haven’t quite figured out yet.
A little context first, because I think it matters for understanding why this particular tension is important.
The philanthropic discovery gap — the core problem I’m trying to solve with giveGroundUp — is essentially a structural visibility problem. Very small nonprofits, the ones doing some of the most important community-rooted work in the country, are largely unknown to donors. Not because their work isn't good. Because they don't have the staff, the budget, or the bandwidth to compete for attention in a sector that rewards organizations that look the part.
And here's the thing about that, the solution can’t just be about increased visibility. It has to be about understanding the entire data ecosystem we're building on. Every data source we pull from — the IRS, ProPublica, Every.org, web enrichment tools — reflects the same structural reality. The organizations with resources have more data. The organizations without resources have less. An automated pipeline built on top of that data doesn't fix the visibility problem. It codifies it.
This is not a new insight and we know that it’s a well-documented challenge in algorithmic systems. You get out what you put in, and what you put in reflects the world as it is, not as it should be. But knowing that in theory and finding it threaded through every layer of your own pipeline are two different things.
Even with our small pilot that we beta tested, here’s where it showed up.
In the data sources themselves. Organizations under $50,000 in annual revenue file a 990-N, which is essentially a postcard that says "we still exist." No program descriptions, no detailed financials, minimal contact information. These are disproportionately the newest organizations, the most community-rooted ones, the ones led by a group of volunteers, the ones operating in places and serving populations that are already underrepresented in the philanthropic data landscape. They exist in our database as a name, an address, and a mission code. When our enrichment pipeline goes looking for more, there's often almost nothing to find.
In our quality scoring. We run an AI-powered quality review on every organization in the database. When I looked at our results, more than half of our database organizations scored as "poor." Some of those had genuine data issues. But when I started auditing what was actually being flagged, I kept finding the same pattern: criteria that on the surface seemed reasonable (does it have a website, is the description specific, is the budget data current) were functioning as proxies for organizational resources rather than organizational quality. A neighborhood mutual aid group that takes donations via Venmo and does its organizing on Facebook is not a poor organization. Our pipeline was treating it like one.
In how we generate profiles. When our AI generates a profile for an organization with thin source data, it doesn't have much to work with. What comes out tends toward a kind of generic professionalized nonprofit framing, think the kind of language you'd find in a polished grant application, not in how a community organization actually describes itself. That matters because it strips out the specificity and the community-rootedness that makes a grassroots org distinctive in the first place. And then that generic profile scores lower in quality review. And then that organization is less likely to surface in matching results. Each layer compounds the one before it.
In matching itself. By the time a donor runs a search, the organizations that should appear may have already been quietly filtered out or deprioritized at multiple upstream points — not because they weren't a good match, but because they started the process with structural disadvantages the algorithm had no way to account for.
I want to be careful here, because I think there's a version of writing about this that sounds like self-congratulation in disguise. Look at us, finding the bias and naming it, aren't we thoughtful! That's not what I'm going for.
What I'm actually sitting with is that we have almost 400 organizations in our database right now, and I genuinely don't know how many of them are there because they're the right organizations for this platform versus how many of them are there because they had enough digital presence to survive our pipeline. Those are not the same thing.
So, that's the urgency I sit with. Not an abstract ethical concern, but a practical one: we could build a platform that claims to fix the discovery gap while quietly reproducing it. And if that happens, the organizations most harmed are the same ones we said we were building this for.
So here's what we're doing, or trying to do instead.
We're separating data confidence from data quality in how we evaluate organizations. A thin data footprint is now a flag for outreach and enrichment, not a signal that an organization doesn't belong in the database.
We're auditing our quality criteria to figure out which ones are actually measuring something meaningful versus which ones are measuring whether an organization looks like it has a communications budget. The latter are being reconsidered.
We changed the profile generation prompts. There are now explicit instructions not to infer activities that aren't confirmed in the source data, not to default to polished nonprofit framing when the organization's own language is plainer, and to flag inconsistencies rather than smooth over them.
We're adding a diversity check to matching results. One that asks, before anything is returned to a user, whether the results are systematically skewed toward well-resourced organizations for reasons that have nothing to do with match quality. If yes, we correct for it.
And we're committing to a quarterly audit practice, with specific questions we'll ask of the data every quarter and documented findings that we'll publish. What percentage of organizations with minimal IRS data are active in the database versus excluded. How quality scores are distributed across budget tiers. Whether low-data organizations are actually surfacing in results. The numbers, whatever they turn out to be.
None of that is the same as having solved it. Some of these are changes we've made. Some are still in progress. Some we won't know worked until we have more data from real users.
What I do know is that this is the kind of problem that gets harder to fix the longer you wait. The infrastructure you build early becomes the infrastructure you scale. The assumptions baked into version one become the debt you're carrying in version three. And in a sector that has spent decades building systems that serve organizations closest to existing power and wealth, the bar for "we tried" is pretty low and the bar for "we actually changed something" is pretty high.
We’re a long way from the second, but we are committed to clearing it.