Extract repo metadata using the Github API
When you inherit a large, unfamiliar codebase, the hardest part is not reading the code.
It’s knowing where to start.
What Do We Actually Want to Know?
Knowledge Graphs (KGs) are built in stages. So starting with basic repo questions lays the foundation for the next level of granularity akin to chain-of-thought reasoning:
- How big is the repo - file nos, size (kB)?
- What's the directory structure?
- What types of files are in this repository?
- Which is the language(s) likely contain the application logic?
- Can I detect a framework?
Answers to these questions come from two sources:
- Direct queries against repository metadata via the Github API, and
- Traversing the repo directory