What does Linus Torvalds mean when he says that Git “never ever” tracks a file? The 2019 Stack Overflow Developer Survey Results Are InGit workflow and rebase vs merge questionsHow to stop tracking and ignore changes to a file in Git?How to make Git “forget” about a file that was tracked but is now in .gitignore?In plain English, what does “git reset” do?Handling file renames in gitsrc refspec master does not match any when pushing commits in gitFind when a file was deleted in GitWhat does the term “porcelain” mean in Git?What does cherry-picking a commit with Git mean?Various ways to remove local Git changes

Why do some words that are not inflected have an umlaut?

Extreme, unacceptable situation and I can't attend work tomorrow morning

Which Sci-Fi work first showed weapon of galactic-scale mass destruction?

On the insanity of kings as an argument against monarchy

Does it makes sense to buy a new cycle to learn riding?

Does light intensity oscillate really fast since it is a wave?

Potential by Assembling Charges

How to deal with fear of taking dependencies

Are there any other methods to apply to solving simultaneous equations?

Where to refill my bottle in India?

How to manage monthly salary

Understanding the implication of what "well-defined" means for the operation in quotient group

Should I use my personal or workplace e-mail when registering to external websites for work purpose?

Deadlock Graph and Interpretation, solution to avoid

How to reverse every other sublist of a list?

What is this 4-propeller plane?

CiviEvent: Public link for events of a specific type

Why do UK politicians seemingly ignore opinion polls on Brexit?

How to change the limits of integration

How do you say "canon" as in "official for a story universe"?

I see my dog run

How is radar separation assured between primary and secondary targets?

Where does the "burst of radiance" from Holy Weapon originate?

Patience, young "Padovan"



What does Linus Torvalds mean when he says that Git “never ever” tracks a file?



The 2019 Stack Overflow Developer Survey Results Are InGit workflow and rebase vs merge questionsHow to stop tracking and ignore changes to a file in Git?How to make Git “forget” about a file that was tracked but is now in .gitignore?In plain English, what does “git reset” do?Handling file renames in gitsrc refspec master does not match any when pushing commits in gitFind when a file was deleted in GitWhat does the term “porcelain” mean in Git?What does cherry-picking a commit with Git mean?Various ways to remove local Git changes



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








17















Quoting Linus Torvalds when asked how many files Git can handle during his Tech Talk at Google in 2007 (43:09):




…Git tracks your content. It never ever tracks a single file. You cannot track a file in Git. What you can do is you can track a project that has a single file, but if your project has a single file, sure do that and you can do it, but if you track 10,000 files, Git never ever sees those as individual files. Git thinks everything as the full content. All history in Git is based on the history of the whole project…




(Transcripts here.)



Yet, when you dive into the Git book, the first thing you are told is that a file in Git can be either tracked or untracked. Furthermore, it seems to me like the whole Git experience is geared towards file versioning. When using git diff or git status output is presented on a per file basis. When using git add you also get to choose on a per file basis. You can even review history on a file basis and is lightning fast.



How should this statement be interpreted? In terms of file tracking, how is Git different from other source control systems, such as VCS?










share|improve this question
























  • reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

    – user2864740
    7 hours ago







  • 1





    Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

    – Elliott Frisch
    7 hours ago







  • 1





    @ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

    – melpomene
    6 hours ago






  • 1





    I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

    – ElpieKay
    6 hours ago







  • 2





    Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

    – Peter Mortensen
    4 hours ago


















17















Quoting Linus Torvalds when asked how many files Git can handle during his Tech Talk at Google in 2007 (43:09):




…Git tracks your content. It never ever tracks a single file. You cannot track a file in Git. What you can do is you can track a project that has a single file, but if your project has a single file, sure do that and you can do it, but if you track 10,000 files, Git never ever sees those as individual files. Git thinks everything as the full content. All history in Git is based on the history of the whole project…




(Transcripts here.)



Yet, when you dive into the Git book, the first thing you are told is that a file in Git can be either tracked or untracked. Furthermore, it seems to me like the whole Git experience is geared towards file versioning. When using git diff or git status output is presented on a per file basis. When using git add you also get to choose on a per file basis. You can even review history on a file basis and is lightning fast.



How should this statement be interpreted? In terms of file tracking, how is Git different from other source control systems, such as VCS?










share|improve this question
























  • reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

    – user2864740
    7 hours ago







  • 1





    Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

    – Elliott Frisch
    7 hours ago







  • 1





    @ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

    – melpomene
    6 hours ago






  • 1





    I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

    – ElpieKay
    6 hours ago







  • 2





    Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

    – Peter Mortensen
    4 hours ago














17












17








17


1






Quoting Linus Torvalds when asked how many files Git can handle during his Tech Talk at Google in 2007 (43:09):




…Git tracks your content. It never ever tracks a single file. You cannot track a file in Git. What you can do is you can track a project that has a single file, but if your project has a single file, sure do that and you can do it, but if you track 10,000 files, Git never ever sees those as individual files. Git thinks everything as the full content. All history in Git is based on the history of the whole project…




(Transcripts here.)



Yet, when you dive into the Git book, the first thing you are told is that a file in Git can be either tracked or untracked. Furthermore, it seems to me like the whole Git experience is geared towards file versioning. When using git diff or git status output is presented on a per file basis. When using git add you also get to choose on a per file basis. You can even review history on a file basis and is lightning fast.



How should this statement be interpreted? In terms of file tracking, how is Git different from other source control systems, such as VCS?










share|improve this question
















Quoting Linus Torvalds when asked how many files Git can handle during his Tech Talk at Google in 2007 (43:09):




…Git tracks your content. It never ever tracks a single file. You cannot track a file in Git. What you can do is you can track a project that has a single file, but if your project has a single file, sure do that and you can do it, but if you track 10,000 files, Git never ever sees those as individual files. Git thinks everything as the full content. All history in Git is based on the history of the whole project…




(Transcripts here.)



Yet, when you dive into the Git book, the first thing you are told is that a file in Git can be either tracked or untracked. Furthermore, it seems to me like the whole Git experience is geared towards file versioning. When using git diff or git status output is presented on a per file basis. When using git add you also get to choose on a per file basis. You can even review history on a file basis and is lightning fast.



How should this statement be interpreted? In terms of file tracking, how is Git different from other source control systems, such as VCS?







git version-control






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 4 hours ago









Peter Mortensen

13.9k1987113




13.9k1987113










asked 7 hours ago









Simón Ramírez AmayaSimón Ramírez Amaya

13211




13211












  • reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

    – user2864740
    7 hours ago







  • 1





    Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

    – Elliott Frisch
    7 hours ago







  • 1





    @ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

    – melpomene
    6 hours ago






  • 1





    I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

    – ElpieKay
    6 hours ago







  • 2





    Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

    – Peter Mortensen
    4 hours ago


















  • reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

    – user2864740
    7 hours ago







  • 1





    Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

    – Elliott Frisch
    7 hours ago







  • 1





    @ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

    – melpomene
    6 hours ago






  • 1





    I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

    – ElpieKay
    6 hours ago







  • 2





    Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

    – Peter Mortensen
    4 hours ago

















reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

– user2864740
7 hours ago






reddit.com/r/git/comments/5xmrkv/what_is_a_snapshot_in_git - "For where you are at the moment, I suspect what's more important to realize is that there's a difference between how Git presents files to users and how it deals with them internally. As presented to the user, a snapshot contains complete files, not merely diffs. But internally, yes, Git uses diffs to generate packfiles that efficiently store revisions." (This is sharp contrast to, eg. Subversion.)

– user2864740
7 hours ago





1




1





Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

– Elliott Frisch
7 hours ago






Git doesn't track files, it tracks changesets. Most version control systems track files. As an example of how / why this can matter, try to check in an empty directory to git (spolier: you can't, because that's an "empty" changeset).

– Elliott Frisch
7 hours ago





1




1





@ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

– melpomene
6 hours ago





@ElliottFrisch That doesn't sound right. Your description is closer to what e.g. darcs does. Git stores snapshots, not changesets.

– melpomene
6 hours ago




1




1





I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

– ElpieKay
6 hours ago






I think he means Git does not track a file directly. A file includes its name and content. Git tracks contents as blobs. Given a blob only, you can't tell what its corresponding file name is. It could be the content of multiple files with different names under different paths. The bindings between a path name and a blob are described in a tree object.

– ElpieKay
6 hours ago





2




2





Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

– Peter Mortensen
4 hours ago






Related: Randal Schwartz' followup to Linus' talk (also a Google Tech talk) - "... What Git is really all about ... Linus said what Git is NOT".

– Peter Mortensen
4 hours ago













2 Answers
2






active

oldest

votes


















38














In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with it's own version number. CVS was based off RCS, which tracked individual files in a similar way.



On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.






share|improve this answer


















  • 1





    What does RCS mean?

    – Bernhard
    2 hours ago











  • @Bernhard : en.wikipedia.org/wiki/Revision_Control_System

    – Eric Towers
    1 hour ago


















16














I agree with brian m. carlson's answer (and have upvoted it): Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



When you ask Git to show you a file's history using:



git log [--follow] [starting-point] [--] path/to/file


what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



  • the commit is a non-merge commit, and

  • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

(but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!






share|improve this answer























  • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

    – Wes Toleman
    3 hours ago











  • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

    – torek
    1 hour ago











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55602748%2fwhat-does-linus-torvalds-mean-when-he-says-that-git-never-ever-tracks-a-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









38














In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with it's own version number. CVS was based off RCS, which tracked individual files in a similar way.



On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.






share|improve this answer


















  • 1





    What does RCS mean?

    – Bernhard
    2 hours ago











  • @Bernhard : en.wikipedia.org/wiki/Revision_Control_System

    – Eric Towers
    1 hour ago















38














In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with it's own version number. CVS was based off RCS, which tracked individual files in a similar way.



On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.






share|improve this answer


















  • 1





    What does RCS mean?

    – Bernhard
    2 hours ago











  • @Bernhard : en.wikipedia.org/wiki/Revision_Control_System

    – Eric Towers
    1 hour ago













38












38








38







In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with it's own version number. CVS was based off RCS, which tracked individual files in a similar way.



On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.






share|improve this answer













In CVS, history was tracked on a per-file basis. A branch might consist of various files with their own various revisions, each with it's own version number. CVS was based off RCS, which tracked individual files in a similar way.



On the other hand, Git takes snapshots of the state of the whole project. Files are not tracked and versioned independently; a revision in the repository refers to a state of the whole project, not one file.



When Git refers to tracking a file, it means simply that it is to be included in the history of the project. Linus's talk was not referring to tracking files in the Git context, but was contrasting the CVS and RCS model with the snapshot-based model used in Git.







share|improve this answer












share|improve this answer



share|improve this answer










answered 7 hours ago









brian m. carlsonbrian m. carlson

2,101412




2,101412







  • 1





    What does RCS mean?

    – Bernhard
    2 hours ago











  • @Bernhard : en.wikipedia.org/wiki/Revision_Control_System

    – Eric Towers
    1 hour ago












  • 1





    What does RCS mean?

    – Bernhard
    2 hours ago











  • @Bernhard : en.wikipedia.org/wiki/Revision_Control_System

    – Eric Towers
    1 hour ago







1




1





What does RCS mean?

– Bernhard
2 hours ago





What does RCS mean?

– Bernhard
2 hours ago













@Bernhard : en.wikipedia.org/wiki/Revision_Control_System

– Eric Towers
1 hour ago





@Bernhard : en.wikipedia.org/wiki/Revision_Control_System

– Eric Towers
1 hour ago













16














I agree with brian m. carlson's answer (and have upvoted it): Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



When you ask Git to show you a file's history using:



git log [--follow] [starting-point] [--] path/to/file


what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



  • the commit is a non-merge commit, and

  • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

(but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!






share|improve this answer























  • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

    – Wes Toleman
    3 hours ago











  • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

    – torek
    1 hour ago















16














I agree with brian m. carlson's answer (and have upvoted it): Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



When you ask Git to show you a file's history using:



git log [--follow] [starting-point] [--] path/to/file


what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



  • the commit is a non-merge commit, and

  • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

(but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!






share|improve this answer























  • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

    – Wes Toleman
    3 hours ago











  • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

    – torek
    1 hour ago













16












16








16







I agree with brian m. carlson's answer (and have upvoted it): Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



When you ask Git to show you a file's history using:



git log [--follow] [starting-point] [--] path/to/file


what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



  • the commit is a non-merge commit, and

  • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

(but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!






share|improve this answer













I agree with brian m. carlson's answer (and have upvoted it): Linus is indeed distinguishing, at least in part, between file-oriented and commit-oriented version control systems. But I think there is more to it than that.



In my book, which is stalled and might never get finished, I tried to come up with a taxonomy for version control systems. In my taxonomy the term for what we're interested here is the atomicity of the version control system. See what is currently page 22. When a VCS has file-level atomicity, there is in fact a history for each file. The VCS must remember the name of the file and what occurred to it at each point.



Git doesn't do that. Git has only a history of commits—the commit is its unit of atomicity, and the history is the set of commits in the repository. What a commit remembers is the data—a whole tree-full of file names and the contents that go with each of those files—plus some metadata: for instance, who made the commit, when, and why, and the internal Git hash ID of the commit's parent commit. (It is this parent, and the directed acycling graph formed by reading all commits and their parents, that is the history in a repository.)



Note that a VCS can be commit-oriented, yet still store data file-by-file. That's an implementation detail, though sometimes an important one, and Git does not do that either. Instead, each commit records a tree, with the tree object encoding file names, modes (i.e., is this file executable or not?), and a pointer to the actual file content. The content itself is stored independently, in a blob object. Like a commit object, a blob gets a hash ID that is unique to its content—but unlike a commit, which can only appear once, the blob can appear in many commits. So the underlying file content in Git is stored directly as a blob, and then indirectly in a tree object whose hash ID is recorded (directly or indirectly) in the commit object.



When you ask Git to show you a file's history using:



git log [--follow] [starting-point] [--] path/to/file


what Git is really doing is walking the commit history, which is the only history Git has, but not showing you any of these commits unless:



  • the commit is a non-merge commit, and

  • the parent of that commit also has the file, but the content in the parent differs, or the parent of the commit doesn't have the file at all

(but some of these conditions can be modified via additional git log options, and there's a very difficult to describe side effect called History Simplification that makes Git omit some commits from the history walk entirely). The file history you see here does not exactly exist in the repository, in some sense: instead, it's just a synthetic subset of the real history. You'll get a different "file history" if you use different git log options!







share|improve this answer












share|improve this answer



share|improve this answer










answered 6 hours ago









torektorek

199k18248331




199k18248331












  • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

    – Wes Toleman
    3 hours ago











  • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

    – torek
    1 hour ago

















  • Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

    – Wes Toleman
    3 hours ago











  • @WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

    – torek
    1 hour ago
















Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

– Wes Toleman
3 hours ago





Another thing to add is this allows Git to do things like shallow clones. It just needs to retrieve the head commit and all the blobs it refers to. It doesn't need to recreate files by applying change sets.

– Wes Toleman
3 hours ago













@WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

– torek
1 hour ago





@WesToleman: it definitely makes that easier. Mercurial stores deltas, with occasional resets, and while the Mercurial folks intend to add shallow clones there (which is possible due to the "reset" idea), they haven't actually done it yet (because it's more of a technical challenge).

– torek
1 hour ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55602748%2fwhat-does-linus-torvalds-mean-when-he-says-that-git-never-ever-tracks-a-file%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to create a command for the “strange m” symbol in latex? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)How do you make your own symbol when Detexify fails?Writing bold small caps with mathpazo packageplus-minus symbol with parenthesis around the minus signGreek character in Beamer document titleHow to create dashed right arrow over symbol?Currency symbol: Turkish LiraDouble prec as a single symbol?Plus Sign Too Big; How to Call adfbullet?Is there a TeX macro for three-legged pi?How do I get my integral-like symbol to align like the integral?How to selectively substitute a letter with another symbol representing the same letterHow do I generate a less than symbol and vertical bar that are the same height?

Българска екзархия Съдържание История | Български екзарси | Вижте също | Външни препратки | Литература | Бележки | НавигацияУстав за управлението на българската екзархия. Цариград, 1870Слово на Ловешкия митрополит Иларион при откриването на Българския народен събор в Цариград на 23. II. 1870 г.Българската правда и гръцката кривда. От С. М. (= Софийски Мелетий). Цариград, 1872Предстоятели на Българската екзархияПодмененият ВеликденИнформационна агенция „Фокус“Димитър Ризов. Българите в техните исторически, етнографически и политически граници (Атлас съдържащ 40 карти). Berlin, Königliche Hoflithographie, Hof-Buch- und -Steindruckerei Wilhelm Greve, 1917Report of the International Commission to Inquire into the Causes and Conduct of the Balkan Wars

Чепеларе Съдържание География | История | Население | Спортни и природни забележителности | Културни и исторически обекти | Религии | Обществени институции | Известни личности | Редовни събития | Галерия | Източници | Литература | Външни препратки | Навигация41°43′23.99″ с. ш. 24°41′09.99″ и. д. / 41.723333° с. ш. 24.686111° и. д.*ЧепелареЧепеларски Linux fest 2002Начало на Зимен сезон 2005/06Национални хайдушки празници „Капитан Петко Войвода“Град ЧепелареЧепеларе – народният ски курортbgrod.orgwww.terranatura.hit.bgСправка за населението на гр. Исперих, общ. Исперих, обл. РазградМузей на родопския карстМузей на спорта и скитеЧепеларебългарскибългарскианглийскитукИстория на градаСки писти в ЧепелареВремето в ЧепелареРадио и телевизия в ЧепелареЧепеларе мами с родопски чар и добри пистиЕвтин туризъм и снежни атракции в ЧепелареМестоположениеИнформация и снимки от музея на родопския карст3D панорами от ЧепелареЧепелареррр