国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

社區(qū)

學(xué)習(xí)

工具庫(kù)

AI工具

休閑

簡(jiǎn)體中文

轉(zhuǎn)換。html日志與嵌套表到。csv文件

P粉190883225 2023-08-01 11:12:35

659

我試圖轉(zhuǎn)換一個(gè)。html文件，其中包含表格形式的日志，它有嵌套的表。我正在將其轉(zhuǎn)換為。csv文件。其中一列有錯(cuò)誤報(bào)告，并在該列中作為新表。我想把整個(gè)表格轉(zhuǎn)換成純文本。嘗試在python中使用beautifulsoup來(lái)實(shí)現(xiàn)這一點(diǎn)，但還沒有運(yùn)氣。嵌套表中的數(shù)據(jù)分散到父表的所有列中，而不固定在原始列中。有什么我能做的嗎? 使用python與beautifulsoup庫(kù)沒有給出所需的輸出

P粉190883225

全部回復(fù)(1)

P粉6626142132023-08-02 10:42:21 1 樓

將帶有嵌套表的HTML文件轉(zhuǎn)換為CSV，同時(shí)保留結(jié)構(gòu)可能有點(diǎn)困難。BeautifulSoup是解析HTML的一個(gè)很好的庫(kù)，但它可能需要額外的操作才能正確處理嵌套表。

為了獲得所需的輸出，可以使用BeautifulSoup和一些自定義Python代碼來(lái)解析HTML、提取數(shù)據(jù)并將其正確組織為CSV格式。這里有一個(gè)循序漸進(jìn)的方法來(lái)幫助你實(shí)現(xiàn)這一目標(biāo):

使用BeautifulSoup解析HTML文件。

找到父表并提取其標(biāo)題。
查找父表中的所有行。
對(duì)于每一行，在相關(guān)列中找到嵌套表(如果存在)。
從嵌套表中提取數(shù)據(jù)，并將其附加到父表中的相應(yīng)單元格中。

下面是一段Python代碼片段來(lái)幫助你入門:

from bs4 import BeautifulSoup
import csv

def extract_nested_table_data(table_cell):
    # Helper function to extract the data from a nested table cell
    nested_table = table_cell.find('table')
    if not nested_table:
        return ''

    # Process the nested table and extract its data as plain text
    nested_rows = nested_table.find_all('tr')
    nested_data = []
    for row in nested_rows:
        nested_cells = row.find_all(['td', 'th'])
        nested_data.append([cell.get_text(strip=True) for cell in nested_cells])
    
    # Convert nested_data to a formatted plain text representation
    nested_text = '\n'.join(','.join(row) for row in nested_data)
    return nested_text

def convert_html_to_csv(html_filename, csv_filename):
    with open(html_filename, 'r', encoding='utf-8') as html_file:
        soup = BeautifulSoup(html_file, 'html.parser')

        parent_table = soup.find('table')
        headers = [header.get_text(strip=True) for header in parent_table.find_all('th')]

        with open(csv_filename, 'w', newline='', encoding='utf-8') as csv_file:
            csv_writer = csv.writer(csv_file)
            csv_writer.writerow(headers)

            rows = parent_table.find_all('tr')
            for row in rows[1:]:  # Skipping the header row
                cells = row.find_all(['td', 'th'])
                row_data = [cell.get_text(strip=True) for cell in cells]

                # Extract data from nested table (if it exists) and append to the row
                for idx, cell in enumerate(cells):
                    nested_data = extract_nested_table_data(cell)
                    row_data[idx] += nested_data

                csv_writer.writerow(row_data)

if __name__ == '__main__':
    html_filename = 'input.html'
    csv_filename = 'output.csv'
    convert_html_to_csv(html_filename, csv_filename)

This code assumes that your nested table data is comma-separated. If it's not, you may need to adjust the separator accordingly. Additionally, consider other delimiters if your nested table contains commas.

Remember that handling complex HTML structures may require further adjustments to this code, depending on the specifics of your data. Nonetheless, this should serve as a good starting point to tackle the task.

點(diǎn)贊 +0

添加回復(fù)

熱門專題

更多>