Google Summer of Code - Git
Info
Patches
Title | URL | Status |
---|---|---|
t7603: replace test -f by test_path_is_file | PATCH | |
merge-strategies.adoc: detail submodule merge | PATCH | |
Add âsubject-extra-prefix flag to format-patch | PATCH | |
Microproject Info: replace [Mentoring][PATCH] by [Mentoring PATCH] | PATCH | |
userdiff: add builtin driver for INI files | PATCH | |
revision: remove log_reencode field from rev_info | PATCH | |
json-writer: add docstrings to jw_* functions | PATCH | |
Update MyFirstObjectWalk with struct repository and meson | PATCH | |
repo-info: add new command for retrieving repository info | PATCH |
Weeks
Application (Jan 20th to May 8th)
During the application period, I sent in total 6 patches to the Git codebase and one patch to git.github.io. 5 of them were accepted, while one was rejected and one is currently under revision.
Also during that period, given my intentions to apply to GSoC, a professor from my university asked me to give two talks about Git on his Free Software Development classes:
- One about Git send-email, available here, in Portuguese
- Other about contributing to Git, available here, in Portuguese
Community Bonding Period (May 8th to Jun 1st)
After being accepted in GSoC, I had two calls: one with all the other GSoC mentees and another one with the GSoC mentors and mentees from Git.
I also inspected the Git codebase in order to understand a little better how I
would serialize data into a JSON. I found the json-writer
module, which does
exactly what I need. Even though its source code is clear and easy to use, it
lacked an overview for people who are not familiarized with it yet. I sent a
PATCH
documenting how to use this module. It was already accepted and merged to next
.
In the last days, I followed the tutorial My First Object Walk, for learning how to declare a new command on Git. It was a little outdated, and it was an opportunity to send another patch fixing it.
Week 1 (Jun 2nd to Jun 8th)
First draft of git repo-info
In this first week, I sent a first RFC to my mentors (Karthik and Patrick). I was asking for internal feedback, and this patch wasnât sent to the Git mailing list.
This first version of git repo
worked like that:
$ git repo-info
{
"object-format": "sha1",
"ref-format": "files"
}
The field object-format
corresponds to the output of
git rev-parse --show-object-format
while the field ref-format
corresponds to
the output of git rev-parse --show-ref-format
.
Some of the suggestions from my mentors were:
- Change the JSON schema, grouping the fields
- Allow the user to choose which fields it wants
- Also support a format simpler than JSON, like LF-terminated or NUL-terminated text
First version of the RFC on git repo-info
After the first review by my mentors, I developed a new version, which was sent to the Git mailing list and can be seen (here).
This second version address the requests from the review of the first version.
This way, repo-info
was redesigned, focusing on a good support for both JSON
and linewise plaintext formats, flexibility for new features and control over
the returned fields.
In this second version, Iâve chosen to return whether the repository is bare and whether it is shallow instead of the object format in order to focus in the idea of the command instead of the implementation details.
Then, this version works like that, using JSON
as the default output format:
$ git repo-info
{
"references": {
"format": "files"
},
"layout": {
"bare": false,
"shallow": false
}
}
Using the plaintext format, this is the output (one field per line):
$ git repo-info --format=plaintext
files
false
false
It will also allow the user to get only the desired fields, like this (note that weâre respecting the order of the fields requested by the user):
$ git repo-info --format=plaintext layout.bare references.format layout.shallow
false
files
false
Or, using the JSON format (note that the order of fields are now ignored, as it wonât work in this format):
$ git repo-info --format=json layout.bare references.format layout.shallow
{
"references": {
"format": "files"
},
"layout": {
"bare": false,
"shallow": false
}
}
Week 2 (Jun 9th to Jun 15th)
After sending the first version in the previous week, this second week was mostly focused on receiving feedback from the mailing list.
Review about the plaintext format
Two questions have arisen about the plaintext format:
-
Ben Knoble and Junio Hamano questioned why not use a
key=value
format in the plaintext output. Karthik Nayak also questioned about this in our weekly meeting, so after three people asking it, this looks like itâs the right path :-). This will be implemented the next version. -
Junio Hamano suggested a better output format, considering the fields may contain line breaks. Currently,
rev-parse
doesnât take it correctly into account and it breaks the assumption that each field will be returned in its own lines. For example, this asks for two fields and returns three lines:
$ git init 'my
repo'
$ cd my\nrepo/
$ git rev-parse --show-toplevel --is-bare-repository
/private/tmp/my
repo
false
Junio also said:
As often said, an earlier mistake is not an excuse to pile more of them on top. Isnât the whole point of this new command to remove these kitchen-sink options out of rev-parse and give them better home? Letâs learn from our earlier mistakes and do it right in the new incarnation.
So letâs do it correctly now!
Review about the CLI
About the CLI, the only reviews were from Karthik:
-
It wasnât clear why I added the
--allow-empty
flag. Karthik is not against it, but he thinks that it should be clear why it existed. In fact, I added this flag targeting the use case of scripts, this way, the scripts can assume one field per requested option. -
Karthik suggested in our weekly meeting that the option list also could also accept the category (e.g.
objects
orpath
) as an option, so the user wonât need to select each of its fields.
Long running server mode
Junio asked if Iâm planning to add a long-running mode like cat-file
has
--batch-command
. I didnât plan that at first, but after his review, Iâm
considering adding this feature after having the basic functionality working.
Week 3 (Jun 16th to Jun 22th)
Given the feedback of the v1, it was time to send the v2 (which can be seen here)
Major changes in v2
The major changes were:
-
The plaintext format now returns its fields in a key=value format
-
The tests were renumbered to t1900, since itâs a new command (the previous was t1518, following the numbering of rev-parse)
-
The test function âtest_repo_infoâ now has a docstring, and it is more flexible for using more complex repository initializations
-
The flag âallow-empty is now introduced in its own commit
-
The plaintext and the JSON formats are now introduced in their own commits
-
The JSON format tests, which depends on the Perlâs JSON module, are now marked with the PERLJSON lazy prereq, being skipped in environments that donât have that module installed
Tasks left for future versions
Some things pointed in the last review werenât implemented as I prefer to do them in another iteration of repo-info after having its basic functionality working:
-
Remove the dependency on
the_repository
when callingis_bare_repository
-
Add a
--batch-command
mode, based on the--batch-command
flag ofcat-file
, which is a long-running mode where the data is requested fromstdin
instead of the CLI -
Add documentation for this new command
-
Use the category as key instead of only accepting category.key. In the current patchset,
git repo-info layout
would equivalent togit repo-info layout.bare layout.shallow
The task of removing this dependency is related to the project of Ayush, another GSoC â25 mentee. I asked him if he intends to do it, and heâll consider that for other patches in the future discussion here
About the --batch-command
mode, I asked Karthik about the use cases of
cat-file --batch-command
and if it would be useful to have a similar feature
in repo-info
. He told me that cat-file --batch-command
is used for
retrieving several data that cat-file
already returned in its CLI mode, but
keeping the same process running instead of calling git cat-file
for each
object. However, cat-file
deals with every object stored in Git, or in other
words, the set of data that it can return contains, at least, every version of
every tracked file, every version of every tracked directory, and every commit
in the history. git repo-info
will return a fixed and small set of data, which
shouldnât be a problem to be entirely retrieved if the user doesnât know
beforehand which fields will be necessary. This way, I decided to discard
--batch-command
.
The reviews (Karthik and Junio) missed the documentation in this v2, which I must admit that it was my bad :-(.
The last feature (using a category as a key in CLI) is not discarded. Itâll be implemented in a future version after having the basic functionality working and ready to use.
The UTF-8 problem
By now, Iâm not dealing with paths, however, Phillip Wood presented an important discussion about it: JSON is an UTF-8 encoded format, but this charset restriction canât be supposed in different filesystems.
This way, itâs not safe to just dump a path as a value when serializing to JSON. Since Iâm not dealing with paths yet Iâll not address this issue by now, but dealing with charset issues is something that I canât avoid in the future.
Week 4 (Jun 23th ~ Jun 29th)
The fourth week, unlike the previous ones, didnât result in another version of the patchset. However, during that week I was working on trying to address the issues from the previous review:
-
About the documentation, I finally wrote one for
git repo-info
. It basically describes the new command syntax, its two output formats, and the data that it currently retrieves. -
Karthik also told me that the commits werenât descriptive enough. I also fixed that.
-
About the tests, Phillip Wood asked me to fix several issues in the tests for this command (
t1900
), which I did in this week. -
About the
--allow-empty
flag, I decided to remove it and add a--all
flag in the future, after the comments of Junio and Karthik. -
Phillip suggested another format instead of the key=value that I was using: a null-terminated format where the keys and values are separated by a line feed (
<key><LF><value><NUL>
). Given that it solves the problem of parsing special characters, that it is already used bygit config --list -z
and that it doesnât have any downsides, Iâll use it. -
There were some minor code changes that I also addressed.
Phillip
is concerned about the (yet to be implemented) field git-dir
, which may induce
users to try to build some paths using this value instead of using git
rev-parse --path
. Given that the --path
options handles several special
cases, my first idea is to list all of them under the path
category. But Iâll
leave that decision to a future iteration, as Iâm focusing more on the
documentation and the machinery of this command by now.
Week 5 (Jun 30th ~ Jul 7th)
This week was dedicated to finish the work that still remained from the last week and then I could send a v3 with the pending changes. This third version can be seen here.
There were two reviews from Patrick that I didnât include in my v3 and that should be highlighted:
-
The first is about
git survey
, another command that is still under development (as seen here). Patrick was thinking about merginggit survey
andgit repo-info
in a new command calledgit repo
, with subcommands housing the functionality of those two commands. -
The second is about refactoring the category and field declaration, which would reduce the number of nested
switch
es. Phillip and Junio also have similar proposals of refactoring. This will be done in v4.
Besides my work in repo-info
, there are one extra thing that I would like to
comment. In the application period entry of
this blog I mentioned the Free Software Development subject of my university,
where the students needs to contribute to free software. After my short talks
about Git, some students became interested in contributing to Git and about GSoC.
Some of those students sent some small patches and
one of those patches,
developed by Isabella Caselli and Rodrigo Michelassi, is already merged to master
!
The experiences of Isabella and Rodrigo are related in her blog.
Apart from that university subject, a friend of mine (Rodrigo Carvalho) was also
interested in contributing to Git. He sent two patches
this
and this,
which are also already merged to master
.
Week 6 (Jul 8th ~ Jul 13th)
In the previous week, Patrick told me that Justin Tobler was working on the git-survey, which would be a Git-native replacement for git-sizer.
After a meeting with my mentors, we agreed that merging Justinâs work and
mine would be a good idea. In the context of GSoC, I would still focus on
developing the git repo-info
features while making room for the new
functionality from git survey
.
Justin contacted me, and we had a call last Friday (July 11th) about this collaboration. Some highlights of our discussion:
-
How would this integration be done? Making this
git repo
command only as a house for two different subcommands, or making it a common interface for our work? An argument for separated subcommands is thatrepo-info
is a light command, whilesurvey
is more computationally expensive. An argument for having a common interface is having a standard format for requesting and retrieving data from both sources. -
A solution for 1. would be keeping the idea of having
repo-info
andsurvey
as two subcommands (perhaps calledgit repo info
andgit repo survey
), following the same output format. This would also make room for a third command which would return data from both commands. Thengit repo
would be a plumbing command (git survey
is more porcelain-ish), and its machinery could be used by a separate porcelain command for formatting its output in a more human-readable way. -
Justin asked me about âwhy JSON?â. And yeah, to be honest Iâm using JSON because it was listed in the GSoC idea of a machine-readable format that could be easily parsed by other applications. Given that this would be (as far as I remember) the only Git command that outputs JSON, it would be out of place, while the other format (null-terminated) is easier to manipulate (e.g. JSON has Unicode issues mentioned by Phillip) and follows an already used syntax (the same as
git config --list -z
). This way, it seems to me that dropping JSON is the way to go.
Then, instead of introducing a new command git repo-info
, I would declare a
new command called git repo
with two subcommands:
git repo info
, which is my GSoC project.git repo survey
, which will include Justinâs work.
Additionally, JSON formatting will be dropped, as the null-terminated format is good enough and JSON offers no significant advantages for this command. Removing JSON will also simplify the code, as I wonât need to handle the details of those formats.
I have submitted a v4 with these changes. Letâs wait for the review.
Week 7 (Jul 14th ~ Jul 20th)
After sending the v4,
I feel that weâre converging on what git repo
will behave :-).
There was a discussion on whether I should use the name repo
for the command
or not. After hearing some suggestions, Iâll keep the name repo
, even though
it sounds a little bit generic.
Justin suggested to bring back the idea of having more than one format, however
this time is for also having a key-value format but more suitable for being read
by humans. The format he suggested
was a simple key=value
format. Thatâs the only new âfeature requestâ of this
iteration, while most of the other comments were about code style (which I
reckon that I was consistently inconsistentâŚ), typos and suggestions for
refactoring.
Then I sent v5 addressing all those issues.
Week 8 (Jul 21th ~ Jul 27th)
The review of v5 was focused in minor changes, such as naming of constants, CLI values and documentation.
Perhaps the biggest issue was about the functions that retrieve the values. In v5 they returned constant strings, which would be a problem for future versions where Iâll need to generate a new string.
Patrick and Junio were discussing
about whether we should escape the values using quote_c_style
. By now, it
wouldnât affect the current values, so I decided to donât use it in v6.
The v6 of this patchset, then, had smaller changes, without the rewrites in the previous versions. It can be seen here.
In parallel, I started working in the next fields, and thinking about what would be the next challenges about then:
-
objects.object-format
: It will return the algorithm used for hashing the objects (i.e.sha1
orsha256
), which will query the same data asgit rev-parse --show-object-format
. However, the flag--show-object-format
comes in three flavours:storage
,input
andoutput
, which currently are the same but may change in the future. I can think in two solutions:-
Split into three values:
objects.object-format-(storage|input|output)
, which seems to be confusing -
Use
objects.object-format
as a set of three values (objects.object-format.(storage|input|output)
), but it seems to be a little bit overengineered
-
-
path.*
: there are some things that need to be solved for retrieving those values:- Should we use absolute or relative paths? Or should we use a flag for that?
- The same discussion about quoting
Week 9 (Jul 28th ~ Ago 3rd)
The reviews of v6 were mainly focused in the tests. Eric Sunshine joined the discussion of this new command and suggested several changes in the tests. You can follow the discussion of v6 here. Those changes really improved the readability and the robustness of the tests.
Given that the reviews didnât require major changes, I could also send a v7, addressing those issues. This new version already has some reviews by Eric and Patrick.
In paralell, I started working on a patch to git repo
for adding -z
as an
alias for --format=null
. This patch will eventually be rebased and will be
part of a patchset containing some of the features that I developed in week 8.
Week 10 (Ago 4th ~ Ago 7th)
This week was focused in polishing the patchset in order to being accepted. The reviews of v7 were focused on small refactors in the tests, documentation and error handling. This resulted in a v8 that basically applies the suggestions from the reviewers.
Karthik and Patrick reviewed the v8
asking for smaller nitpicks. Junio had already accepted and merged it to his
seen
branch, however, the suggestions from Karthik and Patrick were simple
enough to be applied in a
v9
that was sent in the same day.
The patch wasnât merged to next
yet, however, Junio announced in his
âWhatâs cookingâ mail
that it will be merged. This way, I think I can say the basic functionality of
this new command is finished! There are, of course, many features that still
need to be added, however, the most important decisions have already been made
in that first patchset.
In week 8 I related that I started working on the next features. I already have branches for them: