Publishing your project on GitHub
- Last updated on July 7, 2023 at 11:56 PM
Intro: git and GitHub
GitHub is built around a technology called git, a distributed version control system. This may sound intimidating, but all it means is that it lets you create revisions of your code at various points in time, then switch between those revisions at will. For example, let’s say I have the following Python script:
lr = linear_model.LinearRegression()
boston = datasets.load_boston()
y = boston.target
predicted = cross_val_predict(lr, boston.data, y, cv=10)
I now make a revision using git, and add some more lines to the code. In the below code, we:
- Change the dataset
- Change the number of CV folds
- Show a plot
lr = linear_model.LinearRegression()
diabetes = datasets.load_diabetes()
y = diabetes.target
predicted = cross_val_predict(lr, diabetes.data, y, cv=20)
fig, ax = plt.subplots()
ax.scatter(y, predicted)
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=4)
plt.show()
If we make another revision with git, we’ll be able to go back to the first whenever we want and switch between the two freely.
A revision is more commonly known as a commit, and we’ll be using that term going forward. We can upload the commits to GitHub, which enables other people to see our code. Git is much more powerful than just a commit system, and you should try our git course if you want to learn more. For the purposes of uploading your portfolio, it’s fine to think of it this way.
Setting up git and Github
In order to create a commit with git and upload it to GitHub, you first need to install and configure git. The full instructions are here, but we’ll summarize the steps here:
- Install git using this link
- Open the terminal application on your computer
- Set up your git email by typing
git config --global user.email YOUR_EMAIL
. ReplaceYOUR_EMAIL
with an email account. - Set up your git name by typing
git config --global user.name YOUR_NAME
. ReplaceYOUR_NAME
with your full name, likeJane Smith
.
Once you’ve done this, git is set up and configured. Next, we need to create an account on GitHub, then configure git to work with GitHub:
- Create a GitHub account. Ideally, you should use the same email you used earlier to configure git.
- Create an SSH key
- Add the key to your GitHub account
The above setup will let you push commits to GitHub, and pull commits from Github.
Creating a repository
Commits in git occur inside of a repository. A repository is analogous to the folder your project is in. For this part of the tutorial, we’ll use a folder with a file structure like this:
loans
│ README.md
│ main.py
│
└───data
│ test.csv
│ train.csv
You can download the zip file of the folder yourself here and use it in the next steps. The git repository in the above diagram would be the project folder, or loans
. In order to create commits, we first need to initialize the folder as a git repository. We can do this by navigating to the folder, then typing git init
:
$ cd loans
$ git init
Initialized empty Git repository in /loans/.git/
This will create a folder called .git
inside the loans
folder. You’ll get output indicating that the repository was initialized properly. git uses the .git
folder to store information about commits:
loans
│ README.md
│ main.py
└───.git
│
└───data
│ test.csv
│ train.csv
The contents of the .git
the folder isn’t necessary to explore in this tutorial, but you may want to look through it and see if you can figure out how the commit data is stored. After we’ve initialized the repository, we need to add files to a potential commit. This adds files to a staging area. When we’re happy with the files in the staging area, we can generate a commit. We can do this using git add
:
$ git add README.md
The above command will add the README.md
file to the staging area. This doesn’t change the file on the disk but tells git that we want to add the current state of the file to the next commit. We can check the status of the staging area with git status
:
$ git status
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: README.md
Untracked files:
(use "git add </file><file>..." to include in what will be committed)
data/
main.py
</file>
You’ll see that we’ve added the README.md
file to the staging area, but there are still some untracked files, that haven’t been added. We can add all the files with git add
. After we’ve added all the files to the staging area, we can create a commit using git commit
:
$ git commit -m "Initial version"
[master (root-commit) 907e793] Initial version
4 files changed, 0 insertions(+), 0 deletions(-)
create mode 100644 README.md
create mode 100644 data/test.csv
create mode 100644 data/train.csv
create mode 100644 main.py
The -m
option specifies a commit message. You can look back on commit messages later to see what files and changes are contained in a commit. A commit takes all the files from the staging area and leaves the staging area empty.
Making changes to a repository
When we make further changes to a repository, we can add the changed files to the staging area and make a second commit. This allows us to keep a history of the repository over time. We can add changes to a commit the same way we did before. Let's say we change the README.md
file. We’d first run git status
to see what changed:
$ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- </file><file>..." to discard changes in working directory)
modified: README.md
no changes added to commit (use "git add" and/or "git commit -a")
</file>
We can then see exactly what changed with git diff
. If the changes are what we expected, we can add these changes to a new commit:
$ git add .
And then we can commit again:
$ git commit -m "Update README.md"
[master 5bec608] Update README.md
1 file changed, 1 insertion(+)
You may have noticed that the wordmaster
appears after many of these commands are executed. master
is the name of the branch that we’re currently on. Branches allow multiple people to work on a repository at once, or one person to work on multiple features at the same time. Branches are extremely powerful, but we won’t dive into them here. If you’re interested in learning more, this course covers working with multiple branches in detail. For now, it’s enough to know that the primary branch in a repository is called master
. We’ve made all of our changes so far to the master
branch. We’ll be pushing master
to GitHub, and this is what other people will see.
Pushing to GitHub
Once you’ve created a commit, you’re ready to push your repository to GitHub. In order to do this, you first need to create a public repository in the GitHub interface. You can do this by:
- Clicking the “+” icon in the top right of the GitHub interface, then “New Repository”.
- Create a GitHub repository.
- Enter a name for the repository, and optionally enter a description. Then, decide if you want it to be public or private. If it’s public, anyone can see it immediately.
After creating the repo, you’ll see a screen with options for using the repository.
Look under the “…or push an existing repository from the command line” section, and copy the two lines there. Then run them in the command line:
$ cd loans
$ git remote add origin git@github.com:YOUR_GITHUB_USERNAME/YOUR_GIT_REPO_NAME.git
$ git push -u origin master
Counting objects: 7, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (7/7), 608 bytes | 0 bytes/s, done.
Total 7 (delta 0), reused 0 (delta 0)
To git@github.com:YOUR_GITHUB_USERNAME/YOUR_GIT_REPO_NAME.git
* [new branch] master -> master
Branch master set up to track remote branch master from origin.
If you reload the page on GitHub corresponding to your repo you should now see the files you added. By default, the README.md
will be rendered in the repository.
Congratulations! You’ve now pushed a repository to GitHub.
More in this series: