Support private github repository (#1690)

* Refactor: Create new crate binstalk-git-repo-api

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix CI lint warnings

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix `just check`: Rm deleted features from `cargo-hack` check

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: Extract  new mod error

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Ret artifact url in `has_release_artifact`

So that we can use it to download from private repositories.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Move `test_graph_ql_error_type` to mod `error`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix running `cargo test` in `binstalk-git-repo-api``

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Remove unnecessary import in mod `error::test`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Rename mod `request`` to `release_artifacts`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Impl draft version of fetching repo info

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: Move `HasReleaseArtifacts` failure variants into `GhApiError`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: Use `GhRepo` in `GhRelease`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix testing

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: Return `'static` future

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: Make sure `'static` Future is returned

To make it easier to create generic function

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add logging to unit testing

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix unit testing

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: Extract new fn `GhApiClient::do_fetch`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Rm unused `percent_encode_http_url_path`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix `cargo test` run on CI

`cargo test` run all tests in one process.

As such, `set_global_default` would fail on the second call.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Optimize `GhApiClient::do_fetch`: Avoid unnecessary restful API call

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: Rm param `auth_token` for restful API fn

which is always set to `None`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Impl new API `GhApiClient::get_repo_info`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix unit test for `GhApiClient::get_repo_info`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor testing: Parameter-ize testing

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Parallelise `test_get_repo_info`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: Create parameter-ised `test_has_release_artifact`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Parallelize `test_has_release_artifact`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Refactor: `gh_api_client::test::create_client` shall not be `async`

as there is no `.await` in it.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Return `Url` in `GhApiClient::has_release_artifact`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Impl new API `GhApiClient::download_artifact`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Remove unused deps added to binstalk-git-repo-api

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix clippy lints

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add new API `GhApiClient::remote_client`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add `GhApiClient::has_gh_token`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add `GhRepo::try_extract_from_url`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Rename `ReleaseArtifactUrl` to `GhReleaseArtifactUrl`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add new fn `Download::with_data_verifier`

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* feature: Support private repository

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix clippy lints

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add e2e-test/private-github-repo

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix clippy lints

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix `launch_baseline_find_tasks`: Retry on rate limit

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix test failure: Retry on rate limit

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Temporarily enable debug output for e2e-test-private-github-repo

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix `get_repo_info`: Retry on rate limit

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Improve `debug!` logging

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add more debug logging

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add more debugging

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add more debug logging

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Apply suggestions from code review

* Fix compilation

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Fix cargo fmt

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>

* Add crate binstalk-git-repo-api to release-pr.yml

* Update crates/binstalk-git-repo-api/Cargo.toml

* Apply suggestions from code review

* Update crates/binstalk/Cargo.toml

---------

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
This commit is contained in:
Jiahao XU 2024-06-10 16:02:12 +10:00 committed by GitHub
parent 48ee0b0e3e
commit 1dbd2460a3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
30 changed files with 1838 additions and 1127 deletions

View file

@ -3,12 +3,12 @@ use std::sync::{
Once,
};
use binstalk_downloader::gh_api_client::{GhReleaseArtifact, HasReleaseArtifact};
pub(super) use binstalk_downloader::{
download::{Download, ExtractedFiles},
gh_api_client::GhApiClient,
remote::{Client, Url},
};
pub(super) use binstalk_git_repo_api::gh_api_client::GhApiClient;
use binstalk_git_repo_api::gh_api_client::{GhApiError, GhReleaseArtifact, GhReleaseArtifactUrl};
pub(super) use binstalk_types::cargo_toml_binstall::{PkgFmt, PkgMeta};
pub(super) use compact_str::CompactString;
pub(super) use tokio::task::JoinHandle;
@ -16,6 +16,39 @@ pub(super) use tracing::{debug, instrument, warn};
use crate::FetchError;
static WARN_RATE_LIMIT_ONCE: Once = Once::new();
static WARN_UNAUTHORIZED_ONCE: Once = Once::new();
pub(super) async fn get_gh_release_artifact_url(
gh_api_client: GhApiClient,
artifact: GhReleaseArtifact,
) -> Result<Option<GhReleaseArtifactUrl>, GhApiError> {
debug!("Using GitHub API to check for existence of artifact, which will also cache the API response");
// The future returned has the same size as a pointer
match gh_api_client.has_release_artifact(artifact).await {
Ok(ret) => Ok(ret),
Err(GhApiError::NotFound) => Ok(None),
Err(GhApiError::RateLimit { retry_after }) => {
WARN_RATE_LIMIT_ONCE.call_once(|| {
warn!("Your GitHub API token (if any) has reached its rate limit and cannot be used again until {retry_after:?}, so we will fallback to HEAD/GET on the url.");
warn!("If you did not supply a github token, consider doing so: GitHub limits unauthorized users to 60 requests per hour per origin IP address.");
});
Err(GhApiError::RateLimit { retry_after })
}
Err(GhApiError::Unauthorized) => {
WARN_UNAUTHORIZED_ONCE.call_once(|| {
warn!("GitHub API somehow requires a token for the API access, so we will fallback to HEAD/GET on the url.");
warn!("Please consider supplying a token to cargo-binstall to speedup resolution.");
});
Err(GhApiError::Unauthorized)
}
Err(err) => Err(err),
}
}
/// This function returns a future where its size should be at most size of
/// 2-4 pointers.
pub(super) async fn does_url_exist(
@ -24,32 +57,17 @@ pub(super) async fn does_url_exist(
url: &Url,
) -> Result<bool, FetchError> {
static GH_API_CLIENT_FAILED: AtomicBool = AtomicBool::new(false);
static WARN_RATE_LIMIT_ONCE: Once = Once::new();
static WARN_UNAUTHORIZED_ONCE: Once = Once::new();
debug!("Checking for package at: '{url}'");
if !GH_API_CLIENT_FAILED.load(Relaxed) {
if let Some(artifact) = GhReleaseArtifact::try_extract_from_url(url) {
debug!("Using GitHub API to check for existence of artifact, which will also cache the API response");
match get_gh_release_artifact_url(gh_api_client, artifact).await {
Ok(ret) => return Ok(ret.is_some()),
// The future returned has the same size as a pointer
match gh_api_client.has_release_artifact(artifact).await? {
HasReleaseArtifact::Yes => return Ok(true),
HasReleaseArtifact::No | HasReleaseArtifact::NoSuchRelease => return Ok(false),
Err(GhApiError::RateLimit { .. }) | Err(GhApiError::Unauthorized) => {}
HasReleaseArtifact::RateLimit { retry_after } => {
WARN_RATE_LIMIT_ONCE.call_once(|| {
warn!("Your GitHub API token (if any) has reached its rate limit and cannot be used again until {retry_after:?}, so we will fallback to HEAD/GET on the url.");
warn!("If you did not supply a github token, consider doing so: GitHub limits unauthorized users to 60 requests per hour per origin IP address.");
});
}
HasReleaseArtifact::Unauthorized => {
WARN_UNAUTHORIZED_ONCE.call_once(|| {
warn!("GitHub API somehow requires a token for the API access, so we will fallback to HEAD/GET on the url.");
warn!("Please consider supplying a token to cargo-binstall to speedup resolution.");
});
}
Err(err) => return Err(err.into()),
}
GH_API_CLIENT_FAILED.store(true, Relaxed);

View file

@ -1,16 +1,18 @@
use std::{borrow::Cow, fmt, iter, path::Path, sync::Arc};
use binstalk_git_repo_api::gh_api_client::{GhApiError, GhReleaseArtifact, GhReleaseArtifactUrl};
use compact_str::{CompactString, ToCompactString};
use either::Either;
use leon::Template;
use once_cell::sync::OnceCell;
use strum::IntoEnumIterator;
use tokio::time::sleep;
use tracing::{debug, info, trace, warn};
use url::Url;
use crate::{
common::*, futures_resolver::FuturesResolver, Data, FetchError, InvalidPkgFmtError, RepoInfo,
SignaturePolicy, SignatureVerifier, TargetDataErased,
SignaturePolicy, SignatureVerifier, TargetDataErased, DEFAULT_GH_API_RETRY_DURATION,
};
pub(crate) mod hosting;
@ -31,6 +33,8 @@ struct Resolved {
archive_suffix: Option<String>,
repo: Option<String>,
subcrate: Option<String>,
gh_release_artifact_url: Option<GhReleaseArtifactUrl>,
is_repo_private: bool,
}
impl GhCrateMeta {
@ -41,6 +45,7 @@ impl GhCrateMeta {
pkg_url: &Template<'_>,
repo: Option<&str>,
subcrate: Option<&str>,
is_repo_private: bool,
) {
let render_url = |ext| {
let ctx = Context::from_data_with_repo(
@ -82,16 +87,45 @@ impl GhCrateMeta {
let repo = repo.map(ToString::to_string);
let subcrate = subcrate.map(ToString::to_string);
let archive_suffix = ext.map(ToString::to_string);
let gh_release_artifact = GhReleaseArtifact::try_extract_from_url(&url);
async move {
Ok(does_url_exist(client, gh_api_client, &url)
debug!("Checking for package at: '{url}'");
let mut resolved = Resolved {
url: url.clone(),
pkg_fmt,
repo,
subcrate,
archive_suffix,
is_repo_private,
gh_release_artifact_url: None,
};
if let Some(artifact) = gh_release_artifact {
loop {
match get_gh_release_artifact_url(gh_api_client.clone(), artifact.clone())
.await
{
Ok(Some(artifact_url)) => {
resolved.gh_release_artifact_url = Some(artifact_url);
return Ok(Some(resolved));
}
Ok(None) => return Ok(None),
Err(GhApiError::RateLimit { retry_after }) => {
sleep(retry_after.unwrap_or(DEFAULT_GH_API_RETRY_DURATION)).await;
}
Err(GhApiError::Unauthorized) if !is_repo_private => break,
Err(err) => return Err(err.into()),
}
}
}
Ok(Box::pin(client.remote_gettable(url))
.await?
.then_some(Resolved {
url,
pkg_fmt,
repo,
subcrate,
archive_suffix,
}))
.then_some(resolved))
}
}));
}
@ -118,10 +152,11 @@ impl super::Fetcher for GhCrateMeta {
fn find(self: Arc<Self>) -> JoinHandle<Result<bool, FetchError>> {
tokio::spawn(async move {
let info = self.data.get_repo_info(&self.client).await?.as_ref();
let info = self.data.get_repo_info(&self.gh_api_client).await?;
let repo = info.map(|info| &info.repo);
let subcrate = info.and_then(|info| info.subcrate.as_deref());
let is_repo_private = info.map(|info| info.is_private).unwrap_or_default();
let mut pkg_fmt = self.target_data.meta.pkg_fmt;
@ -230,13 +265,22 @@ impl super::Fetcher for GhCrateMeta {
// basically cartesian product.
// |
for pkg_fmt in pkg_fmts.clone() {
this.launch_baseline_find_tasks(&resolver, pkg_fmt, &pkg_url, repo, subcrate);
this.launch_baseline_find_tasks(
&resolver,
pkg_fmt,
&pkg_url,
repo,
subcrate,
is_repo_private,
);
}
}
if let Some(resolved) = resolver.resolve().await? {
debug!(?resolved, "Winning URL found!");
self.resolution.set(resolved).unwrap(); // find() is called first
self.resolution
.set(resolved)
.expect("find() should be only called once");
Ok(true)
} else {
Ok(false)
@ -245,7 +289,10 @@ impl super::Fetcher for GhCrateMeta {
}
async fn fetch_and_extract(&self, dst: &Path) -> Result<ExtractedFiles, FetchError> {
let resolved = self.resolution.get().unwrap(); // find() is called first
let resolved = self
.resolution
.get()
.expect("find() should be called once before fetch_and_extract()");
trace!(?resolved, "preparing to fetch");
let verifier = match (self.signature_policy, &self.target_data.meta.signing) {
@ -290,11 +337,18 @@ impl super::Fetcher for GhCrateMeta {
"Downloading package",
);
let mut data_verifier = verifier.data_verifier()?;
let files = Download::new_with_data_verifier(
self.client.clone(),
resolved.url.clone(),
data_verifier.as_mut(),
)
let files = match resolved.gh_release_artifact_url.as_ref() {
Some(artifact_url) if resolved.is_repo_private => self
.gh_api_client
.download_artifact(artifact_url.clone())
.await?
.with_data_verifier(data_verifier.as_mut()),
_ => Download::new_with_data_verifier(
self.client.clone(),
resolved.url.clone(),
data_verifier.as_mut(),
),
}
.and_extract(resolved.pkg_fmt, dst)
.await?;
trace!("validating signature (if any)");

View file

@ -1,13 +1,12 @@
#![cfg_attr(docsrs, feature(doc_auto_cfg))]
use std::{path::Path, sync::Arc};
use std::{path::Path, sync::Arc, time::Duration};
use binstalk_downloader::{
download::DownloadError, gh_api_client::GhApiError, remote::Error as RemoteError,
};
use binstalk_downloader::{download::DownloadError, remote::Error as RemoteError};
use binstalk_git_repo_api::gh_api_client::{GhApiError, GhRepo};
use binstalk_types::cargo_toml_binstall::SigningAlgorithm;
use thiserror::Error as ThisError;
use tokio::sync::OnceCell;
use tokio::{sync::OnceCell, time::sleep};
pub use url::ParseError as UrlParseError;
mod gh_crate_meta;
@ -28,6 +27,8 @@ mod futures_resolver;
use gh_crate_meta::hosting::RepositoryHost;
static DEFAULT_GH_API_RETRY_DURATION: Duration = Duration::from_secs(1);
#[derive(Debug, ThisError)]
#[error("Invalid pkg-url {pkg_url} for {crate_name}@{version} on {target}: {reason}")]
pub struct InvalidPkgFmtError {
@ -145,6 +146,7 @@ struct RepoInfo {
repo: Url,
repository_host: RepositoryHost,
subcrate: Option<CompactString>,
is_private: bool,
}
/// What to do about package signatures
@ -180,29 +182,61 @@ impl Data {
}
#[instrument(level = "debug")]
async fn get_repo_info(&self, client: &Client) -> Result<&Option<RepoInfo>, FetchError> {
async fn get_repo_info(&self, client: &GhApiClient) -> Result<Option<&RepoInfo>, FetchError> {
self.repo_info
.get_or_try_init(move || {
Box::pin(async move {
if let Some(repo) = self.repo.as_deref() {
let mut repo = client.get_redirected_final_url(Url::parse(repo)?).await?;
let repository_host = RepositoryHost::guess_git_hosting_services(&repo);
let Some(repo) = self.repo.as_deref() else {
return Ok(None);
};
let repo_info = RepoInfo {
subcrate: RepoInfo::detect_subcrate(&mut repo, repository_host),
repo,
repository_host,
};
let mut repo = Url::parse(repo)?;
let mut repository_host = RepositoryHost::guess_git_hosting_services(&repo);
debug!("Resolved repo_info = {repo_info:#?}");
Ok(Some(repo_info))
} else {
Ok(None)
if repository_host == RepositoryHost::Unknown {
repo = client
.remote_client()
.get_redirected_final_url(repo)
.await?;
repository_host = RepositoryHost::guess_git_hosting_services(&repo);
}
let subcrate = RepoInfo::detect_subcrate(&mut repo, repository_host);
let mut is_private = false;
if repository_host == RepositoryHost::GitHub && client.has_gh_token() {
if let Some(gh_repo) = GhRepo::try_extract_from_url(&repo) {
loop {
match client.get_repo_info(&gh_repo).await {
Ok(Some(gh_repo_info)) => {
is_private = gh_repo_info.is_private();
break;
}
Ok(None) => return Err(GhApiError::NotFound.into()),
Err(GhApiError::RateLimit { retry_after }) => {
sleep(retry_after.unwrap_or(DEFAULT_GH_API_RETRY_DURATION))
.await
}
Err(err) => return Err(err.into()),
}
}
}
}
let repo_info = RepoInfo {
subcrate,
repo,
repository_host,
is_private,
};
debug!("Resolved repo_info = {repo_info:#?}");
Ok(Some(repo_info))
})
})
.await
.map(Option::as_ref)
}
}