research api next up is starting to write it

2026-05-01 11:22:38 +02:00
parent 84048a521b
commit 192abe44ba
5 changed files with 681 additions and 0 deletions
@@ -0,0 +1,72 @@
+# ani-cli Codebase Analysis: How It Retrieves Anime Episodes
+
+Based on the exploration of the `ani-cli` repository, here is how the application works behind the scenes to fetch anime episodes. The application utilizes a `bash` script which operates primarily by scraping and making API requests to `allanime` endpoints.
+
+### 1. The Target API
+`ani-cli` points to a GraphQL API backend:
+*   **Base URL**: `https://api.allanime.day/api` (constructed from `allanime_base="allanime.day"`)
+*   **Referrer Policy**: To bypass basic bot protections, it explicitly sets the HTTP Referer header to `https://allmanga.to` (`$allanime_refr`) and passes a user-agent.
+
+### 2. Searching & Episode Lists via GraphQL
+The codebase uses specific embedded GraphQL queries encoded within the bash script, sent via `curl -X POST`.
+*   **Search**: It queries `shows(search: ...)` using a query named `search_gql` to find titles and returns their respective `_id` and episode count.
+*   **Episode Listing**: Once an `_id` is found, it queries `episodes_list_gql` to retrieve a list of available episodes (e.g., `availableEpisodesDetail`) for the chosen sub/dub setting (`translationType`).
+
+### 3. Fetching the Episode Video Links
+When an episode is selected, `ani-cli` needs the embedded player source. It does this by making another GraphQL request using `episode_embed_gql`.
+*   It passes the `$showId`, `$translationType` (sub or dub mode), and `$episodeString` (the episode number).
+*   The API returns a JSON payload containing `sourceUrls`.
+
+### 4. Bypassing Encryption (`tobeparsed`)
+Sometimes, `allanime` obfuscates the video source URLs to prevent scraping. The API returns an encrypted base64 payload under the key `"tobeparsed"`.
+*   `ani-cli` catches this field with `grep -q '"tobeparsed"'`.
+*   It then routes the blob to a decryption function `decode_tobeparsed()`.
+*   **The Decryption Method**: It extracts the IV (first 12 bytes of the decoded base64 string) and uses `openssl` to run AES-256-CTR decryption against the rest of the payload. 
+*   **The Key**: The decryption key (`$allanime_key`) is dynamically generated by taking the SHA-256 hash of the hardcoded salt string: `Xot36i3lK3:v1`.
+
+### 5. Link Generation & Processing
+Once the embed URLs are decrypted (or retrieved plain), they are mapped to respective video providers using `generate_link()`. Providers include `wixmp` (the default), `youtube`, `sharepoint`, and `hianime`.
+*   The `get_links()` function takes the direct links, hits them, and uses `sed` to extract `.mp4` URLs or `.m3u8` playlist files based on the provider format.
+*   Subtitle URLs are also isolated if available.
+
+### 6. Streaming or Downloading
+Finally, these isolated stream links (along with the necessary referrer headers) are passed directly into standard media players like `mpv`, `vlc`, `android_vlc`, or downstream download managers like `aria2c`.
+
+---
+
+## Code Reference (Line Numbers)
+
+Here are the exact line numbers in the `ani-cli` script where these specific mechanisms are implemented:
+
+*   **API Configuration & Keys**:
+    *   `allanime_refr="https://allmanga.to"`: **Line 405**
+    *   `allanime_base="allanime.day"`: **Line 406**
+    *   `allanime_api="https://api.${allanime_base}"`: **Line 407**
+    *   `allanime_key` (The hardcoded AES key hash): **Line 408**
+*   **GraphQL Queries**:
+    *   `episode_embed_gql` (Fetching the video player URLs): **Line 227**
+    *   `search_gql` (Searching for anime titles): **Line 257**
+    *   `episodes_list_gql` (Getting available episodes): **Line 280**
+*   **The Decryption Logic**:
+    *   The `decode_tobeparsed()` function where the AES-256-CTR decryption happens: **Lines 211 - 221**
+    *   The check that routes the response to the decryption function (`if printf "%s" "$api_resp" | grep -q '"tobeparsed"'; then`): **Line 230**
+
+---
+
+## Is this a partnership, or can you do it yourself?
+
+**You can absolutely do this yourself.** This is **not** an official partnership. 
+
+What the developers of `ani-cli` have done is known as **Reverse Engineering** and **Web Scraping**. When you watch a video on a site like *allanime* in your normal web browser, your browser has to know how to talk to their servers to get the video files. Because all of this happens on the client-side (in your browser), the instructions are visible if you know where to look.
+
+Here is how developers (and how you can) figure this out for almost any website:
+
+1.  **Network Tab Inspection**: If you open your browser's Developer Tools (F12) and go to the "Network" tab, you can see every request the website makes. If you search for an anime, you will see a `POST` request going to `https://api.allanime.day/api`. 
+2.  **Payload Analysis**: By clicking on that network request, you can see exactly what data was sent (the GraphQL query) and what the server responded with (the JSON payload).
+3.  **Bypassing Basic Protections**: Websites try to stop automated scripts from doing this by checking headers. The developers saw that the site checks the `Referer` header to make sure the request is coming from `https://allmanga.to`. So, they simply programmed `ani-cli` to fake that header (`curl -e "https://allmanga.to"`).
+4.  **Finding Encryption Keys**: When the site started returning encrypted `"tobeparsed"` blobs instead of plain video URLs, the developers of `ani-cli` likely opened the "Sources" tab in their browser's Developer Tools, downloaded the website's obfuscated JavaScript files, and reverse-engineered how the web player decrypts the video. That's how they found the exact AES algorithm (`aes-256-ctr`) and the hardcoded salt string (`Xot36i3lK3:v1`).
+
+**Can you do this?** 
+Yes! You can use tools like Python (with `requests` and `BeautifulSoup`), Bash (like this script uses `curl`, `grep`, and `sed`), or NodeJS to replicate these exact network requests for any site. 
+
+*Note: Because this is reverse-engineered, sites frequently change their API endpoints, encryption keys, or security measures to break scrapers, which is why tools like `ani-cli` require constant updates.*