想通过 AngleSharp 实现这样一个功能,解析出文本中的所有 url 进行分析
ha_source=bokeyuan&ha_sourceId=89000449,则加上ha_source 与 ha_sourceId,但值不一样,则替换请问如何实现?
通过下面的代码实现了
namespace Cnblogs.Text;
public partial class HtmlUtility
{
public static async Task<string> ModifyHtmlUrls(string html, string hostname, string queryString)
{
var isHtmlChanged = false;
var context = BrowsingContext.New(Configuration.Default);
var document = await context.OpenAsync(res => res.Content(html));
var elements = document.GetElementsByTagName("a");
foreach (var element in elements)
{
if (element is IHtmlAnchorElement anchor)
{
var newUrl = AddUrlParameters(anchor.Href, queryString);
if (newUrl.Contains(hostname, StringComparison.OrdinalIgnoreCase))
{
if (newUrl != anchor.Href)
{
anchor.Href = newUrl;
isHtmlChanged = true;
}
if (anchor.Target != "_blank")
{
anchor.Target = "_blank";
isHtmlChanged = true;
}
}
}
}
return isHtmlChanged && document.Body != null ? document.Body.InnerHtml : html;
}
public static string AddUrlParameters(string url, string queryString)
{
var urlBuilder = new UriBuilder(url);
var originQueryString = urlBuilder.Query;
var originQueryParams = HttpUtility.ParseQueryString(originQueryString);
var newQueryParams = QueryHelpers.ParseQuery(queryString);
foreach (var key in newQueryParams.Keys)
{
if (newQueryParams.TryGetValue(key!, out var values))
{
originQueryParams.Set(key, values.LastOrDefault());
}
}
urlBuilder.Query = originQueryParams.ToString();
return urlBuilder.Uri.ToString();
}
}
用到的 nuget 包
<Project>
<ItemGroup>
<PackageReference Include="AngleSharp" Version="1.3.1" />
<PackageReference Include="Microsoft.AspNetCore.WebUtilities" Version="9.0.10" />
</ItemGroup>
</Project>
ASP.NET Core – 操作 Uri 和 Query
– dudu 3周前
– dudu 3周前QueryHelpers.ParseQuery的返回值类型是Dictionary<string, StringValues>,没有 Set 方法
– dudu 3周前HttpUtility.ParseQueryString的返回值类型是NameValueCollection,有 Set 方法